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Introduction 


This  report  presents  the  results  of  the  investigations  conducted  over  a 


period  of  four  years  on  control  algorithms  designed  for  stochastic  systems. 
The  main  feature  of  these  algorithms  is  that  -they  account  for  . 


IT  I)  the  current  uncertainty  in  the  systemj  fa fa 


y  f-iirt  5-lthe  anticipated  future  uncertainty  in  the  system,  which  is, 
'in  general  control-dependent 


The  first  feature  leads  to  the  control  to  nave  the  "cautious  property 


in  order  to  minimize  the  effect  of  the  current  uncertainties  on  the  system's 
performance . 

The  second  feature  allows  the  control  to  affect  in  addition  to  tne 


system's  state  also  the  system's  uncertainty.  Such  a  controller  is  called 
"dual  controller"  because,  by  taking  advantage  of  its  *dual  effect"  has  the 


capability  of  reducing  the  future  uncertainties. 

These  uncertainties  can  pertain  to  the  system's  state  or  its  unknown 
parameters.  Both  continuous-valued  and  discrete-valued  uncertainties  have 
been  considered.  ' 


The  next  section  summarizes  the  major  results  of  the  research  effort 
that  have  been  published  in  reading  control  journals  and  presented  at 
major  national  and  international  conferences. 
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In  the  following  an  outline  of  each  publication  is 


papers  appear  in  the  Appendix. 


2.1.  C.J.  Wenk  and  y.  Bar-Shalora,  "A  Multiple  Model  Adaptive  Dual  Control 

Algorithm  for  Stochastic  Systems  with  Unknown  Parameters,”  IEEE  Trans. 
Automatic  Control,  vol.  AC-25,  pp.  703-710,  Aug.  1980. 

In  this  work  an  adaptive  dual  control  algorithm  is  presented  for  linear 
stochastic  systems  with  constant  but  unknown  parameters.  The  system 
parameters  are  assumed  to  belong  to  a  finite  set  on  which  a  prior  proba¬ 
bility  distrioution  is  available.  The  tool  used  to  derive  the  algorithm 
is  preposterior  analysis:  a  probabilistic  characterization  of  the  future 
adaptation  process  allows  the  contrtoller  to  take  advantage  of  the  dual 
effect.  The  resulting  actively  adaptive  control  called  model  adaptive 
dual  (MAD)  control  is  compared  to  two  passively  adaptive  control 
algorithms-the  heuristic  certainty  equivalence  ( HCE)  and  the  De- 
shpande-Upadhyay-Lainiotis  (DUL)  model-weighted  controllers.  An 
analysis  technique  developed  for  the  comparison  of  different  con¬ 
trollers  is  used  to  show  statistically  significant  improvement  in 
the  performance  of  the  MAD  algorithm  over  those  of  the  HCE  and  DUL. 

2.2  y.  Bar-Shalom,  "Stochastic  Dynamic  programming.  Caution  and  prooing," 

IEEE  Trans.  Automatic  Control,  vol.  AC-26,  pp.  1184-1195,  Oct.  1981. 

Tne  purpose  of  this  paper  is  to  unify  the  concepts  of  caution  and  probing 
put  forth  by  Feldbaum  with  the  mathematical  technique  of  stochastic  dynamic 
programming  originated  oy  Bellman.  The  recently  developed  decomposition  of 
the  expected  cost  in  a  stochastic  control  problem,  is  used  to  assess  quan¬ 
titatively  the  caution  and  prooing  effects  of  the  system,  uncertainties 


on  the  control.  It  is  shown  how  in  some  problems,  because  of  the 


uncertainties,  the  control  becomes  cautious  (less  aggressive)  while  in 
other  problems  it  will  probe  (by  becoming  more  aggressive)  in  order  to 
enhance  the  estimation/identification  while  controlling  the  system. 
Following  this  a  classification  of  stochastic  control  problems 
according  to  the  dominant  effect  is  discussed.  Tnis  is  then  used  to  point 
out  which  are  the  stochastic  control  problems  where  substantial  improve¬ 
ments  can  be  expected  from  using  a  sophisticated  algorithm  versus  a 
simple  one. 

2.3  Y.  Bar-Shalom  and  J.  A.  Molusis,  "stochastic  Control  and  Identification 
Enhancement  for  the  Flutter  Suppression  problem,"  proc.  8th  IFAC  World 
Congress ,  Kyoto,  Japan,  Aug.  1981. 

The  topic  of  this  paper  is  the  application  of  some  recent  results  in 
Stochastic  control  to  an  aerospace  problem  where  there  are  large 
uncertainties  in  the  dynamics  of  the  plant  to  be  controlled.  An  approxi¬ 
mation  to  the  stochastic  Dynamic  programming  is  considered  that  results 
in  an  adaptive  control  of  the  "closed-loop"  type:  it  utilizes  feedback 
(latest  state  and  parameter  estimates  and  their  uncertainties)  as  well 
as  their  anticipated  future  uncertainties  -  it  anticipates  (subject  to 
causality)  subsequent  feedback.  This  algorithm  has  the  feature  that 
allows  the  control  to  enhance  the  parameter  identification  in  real  time. 
This  is  done  using  the  control's  dual  effect:  the  control  can  affect 
the  state  as  well  as  the  (augmented)  state  uncertainty  and  thus  can 
reduce  the  uncertainty  aoout  some  parameters.  A  flight  control  applica¬ 
tion  in  which  stochastic  adaptive  control  appears  to  offer  significant 
payoff  is  the  active  control  of  aircraft  wing-store  flutter.  Improved 
flutter  suppression  can  be  accomplished  with  an  adaptive  controller  that 
has  the  capability  to  learn  and  identify  the  flutter  modes  during  the 


flight 


2.4  C.  J.  Wenk  and  Y.  Bar-Shalom,  "Model  Adaptive  Dual  Control  of  MImo 
Stochastic  Systems,"  proc.  20th  IEEE  Conf .  on  Decision  and  Control, 

San  Diego,  CA,  Dec.  1981. 

An  adaptive  dual  control  algorithm  is  presented  for  multiple-input, 
multiple  output  (MIMO)  linear  systems  with  input  and  output  noise  and 
unknown  parameters.  The  system  parameters  are  assumed  to  belong  to  a 
finite  set  on  which  a  prior  probability  distribution  is  available.  The 
difficulties  in  characterizing  the  future  evolution  Of  the  MIMO  system 
information  as  required  by  the  dynamic  programmig  are  overcome  through 
a  novel  way  of  using  preposterior  analysis.  This  provides  a  proba¬ 
bilistic  characterization  of  the  future  adaptation  process  and  allows 
the  controller  to  take  advantage  of  the  dual  effect. 

2-5  Y.  Bar-Shalom,  p.  Mookerjee  and  J.  A.  Molusis,  "A  Linear  Feedback  Dual 
Controller  for  a  Class  of  Stochastic  Systems,"  proc.  CNRS  Collog. 
Automatigue ,  Belle-Ile,  France,  Sept.  1982. 

The  methodology  for  deriving  a  dual  control  algorithm  that  has  a  linear 
feedback  form  is  presented.  This  control,  while  simple,  has  the  capa¬ 
bility  of  enhancing  the  identification  of  the  system's  unknown 
parameters.  A  dual  controller  for  a  plant  describing  the  helicopter 
higher  harmonic  vibration  control  problem  is  presented  together  with 
simulation  results. 

2.6  K.  Birmiwal  and  Y.  Bar-Shalom,  "Dual  Control  Guidance  for  Simultaneous 
Identification  and  Interception  of  a  Target,"  Automat ica  20:737-749, 

Wove .  1984 . 

An  adaptive  dual-control  guidance  algorithm  is  presented  for  intercepting 
a  moving  target  in  the  presence  of  an  interf erring  target  (decoy)  in  a 
stochastic  environment.  Two  sequences  of  measurements  are  obtained  at 
discrete  points  in  time;  however,  it  is  not  certain  which  sequence  came 


from  the  target  or  interest  and  which  from  the  decoy.  Associated  with 


5 


each  track,  the  interceptor  also  receives  noisy,  state-dependent  feature 
measurements.  The  optimum  control  for  the  interceptor  which  is  given  by 
the  solution  of  the  stochastic  dynamic  programming  equation  is  not  numeri¬ 
cally  feasible  to  obtain.  An  approximate  solution  of  this  equation  is 
obtained  by  evaluating  the  value  of  the  future  information  gathering. 

This  is  done  through  the  use  of  preposterior  analysis :approximate  prior 
probability  densities  are  obtained  and  used  to  describe  the  future 
learning  and  control.  In  this  way,  the  interceptor  control  is  used 
for  information  gathering  in  order  to  reduce  the  future  target  and  decoy 
decoy  inertial  measurement  errors  and  enhance  the  observable  target/decoy 
feature  differences  for  subsequent  discrimination  between  the  true  target 
and  the  decoy.  Simulation  studies  have  shown  the  effectiveness  of  the 
scheme. 

2.7  J.  A.  Molusis,  P.  Mookerjee  and  V.  Bar-Shalom,  "Dual  Adaptive  Control 
Based  upon  Sensitivity  Functions,"  Proc.  23rd  IEEE  Conf.  on  Decision 
and  Control,  Las  Vegas,  NV,  Dec.  1984. 

A  new  adaptive  dual  control  solution  is  presented  for  the  control  of  a 
class  of  multi-variable  input-output  system.  Both  rapidly  varying 
random  parameters  and  constant  but  unknown  parameters  are  included.  The 
new  controller  modifies  the  cautious  control  design  by  numerator  and 
denominator  correction  terms.  This  controller  is  shown  to  depend  upon 
sensitivity  functions  of  the  expected  future  cost.  A  scalar  example 
is  presented  to  provide  insight  into  the  properties  of  the  new  dual 
controller.  Monte-Carlo  simulations  are  performed  which  show  improve¬ 
ment  over  the  cautious  controller  and  the  Linear  Feedback  Dual  Con¬ 


troller  . 
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A  Multiple  Model  Adaptive  Dual  Control  Algorithm 
for  Stochastic  Systems  with  Unknown  Parameters 

CARL  J.  WENK,  student  member,  ieee,  and  YAAKOV 
BAR-SHALOM.  senior  member,  ieee 

Abstract — Aa  adaptive  dual  control  algorithm  is  presented  for  linear 
storh—ltr  systems  with  constant  but  unknown  parameters.  The  system 
paramrtrrs  are  assumed  to  belong  to  a  finite  set  on  which  a  prior 
probability  distribution  la  available.  The  tool  used  to  derive  the  algorithm 
Is  preposterior  analysis:  a  probabilistic  characterization  of  the  future 
adaptation  process  allows  the  controller  to  take  advantage  of  the  deal 
effect  The  malting  actively  adaptive  control  called  model  adaptive  dual 
(MAD)  control  is  compared  to  two  passively  adaptive  control 
algorithm — the  heuristic  certainty  equivalence  (HCE)  and  the  De- 
shpaadr  Upadhyay-1  alalotit  (DLL)  andei- weighted  c  outvoters.  An  analy¬ 
sis  technique  developed  for  the  rowparima  of  different  controllers  It  uaed 
to  show  statiartraBy  significant  improvement  in  the  performance  of  the 
MAD  algorithm  over  thooc  of  the  HCE  oad  DLL. 

I.  Introduction 

In  the  control  of  linear  stochastic  systems  with  quadratic  cost,  the 
certainty  equivalence  property  |6]  is  known  to  hold.  If,  however,  there 
are  unknown  parameters  in  the  system  to  be  controlled,  then  certainty 
equivalence  does  not  hold  and  the  dynamic  programming  cannot  be 
solved  (I]  In  this  case  a  control  decision  is  known  to  affect  not  just  the 
future  state  of  the  system,  but  also  the  future  state  and  parameter 
uncertainty,  that  is.  the  control  has  the  dual  effect,  first  discussed  by 
Feldbaum  ( I S).  and  later  shown  to  be  intimately  related  to  the  certainty 
equivalence  property  [6|. 


Manuscript  received  Apnl  4.  1979.  revised  March  4,  1990  Paper  recommended  by  J  L 
Speyer,  Past  Chairman  of  the  Stochastic  Control  Committee  Thu  work  was  supported  by 
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Because  the  parameter  uncertainty  renders  the  optimum  control  solu¬ 
tion  unattainable,  a  number  of  parameter-adaptive  suboptimum  control 
strategies  have  been  sought  [  14].{  19)  [23J.|I3|.(2).|24].  With  the  excep¬ 
tion  of  [24],  most  of  these  strategies,  however,  are  in  the  passive  feedback 
classification  as  discussed  in  [6],  That  is.  they  do  not  take  into  account 
the  knowledge  that  future  learning  about  the  unknown  system  parame¬ 
ters  will  occur.  An  algorithm  which  uses  such  knowledge  to  improve  its 
control  decisions  is  called  actively  adaptive:  it  takes  advantage  of  the 
dual  effect  of  the  control  to  improve  the  identification  and  ultimately  the 
performance. 

This  paper  presents  an  actively  adaptive  control  algorithm  for  linear 
stochastic  systems  where  the  vector  0  consisting  of  the  constant  but 
unknown  system  parameters,  is  equal  to  one  of  the  M  known  model 
parameter  vectors  9r j ■  .M.  Ons  assumption  that  the  true  system 
is  a  member  of  a  discrete  set  of  known  model  systems  has  been  used  for 
the  dr-  rlopment  of  a  number  of  passively  adaptive  control  algorithms 
|131.|W-.|;:;.;;3I  and  has  received  considerable  recent  attention  in.  for 
example,  the  adaptive  flight  control  problem  for  the  F-8  Digital-Fly-By- 
Wire  Aircraft  |2).|3]  Performance  difficulties  have  arisen,  however,  due 
to  the  inherently  passive  learning  properties  of  existing  algorithms  de¬ 
signed  for  the  multiple  model  adaptive  control  problem  The  algorithm 
presented  in  this  paper,  with  its  active  learning  properties,  should  repre¬ 
sent  an  advance  toward  a  more  sophisticated  solution  of  the  multiple 
model  problem. 

The  actively  adaptive  control  algorithm  presented  here,  called  the 
model  adaptive  dual  (MAD)  control  algorithm,  is  developed  and  studied 
within  the  context  of  controlling  the  output  of  a  single-input,  single-out- 
put  system,  in  order  to  help  gain  understanding  of  the  dual  effect  of  the 
control  in  the  multiple  model  problem.  The  problem  is  formulated  in 
Section  II.  The  MAD  algorithm  for  two  models  is  obtained  in  Sections 
III  -V  by  constructing  an  approximate  solution  to  the  stochastic  dynamic 
programming,  the  exact  solution  of  which  would  give  the  globally 
optimum  (dual)  control.  Evaluation  of  the  value  of  future  information 
gathering  will  be  made  through  the  use  of  prepostenor  analysis  (18); 
approximate  pnor  probability  densities  are  obtained  and  used  to  de¬ 
scribe  future  learning  and  control  The  extension  to  M  >  2  models  is 
presented  in  Section  VI. 

Numerical  studies  and  comparisons  of  the  MAD  algorithm  are  made 
in  Section  VII  with  two  passive  algorithms,  the  heuristic  certainty 
equivalence  (HCEl  algorithm,  and  the  Dcshpande-Upadhyay-Uuniotis 
(DUL)  algorithm,  as  well  as  with  the  optimal  controls  produced  for  each 
model  system  with  known  parameters  A  ngorous  statistical  analysis 
technique  is  presented  for  a  meaningful  comparison  of  the  performances 
obtained  from  Monte  Carlo  simulations  employing  the  above  algo¬ 
rithms.'  It  is  shown  by  statistical  tests  performed  on  the  results  of  a 
Monte  Carlo  simulation  procedure  that  significant  performance  im¬ 
provements  may  be  achieved  using  MAD  over  HCE  and  DUL.  In  the 
latter  algorithm,  used  in  the  F-8  aircraft  problem  in  (2),  the  control  is 
formed  as  a  weighted  sum  of  the  model-optimal  controls. 

Lastly,  while  the  MAD  algorithm  is  designed  for  eventual  on-line 
computational  feasibility,  it  is  more  expensive  than  HCE  and  DUL.  It  is 
also  pointed  out  that  MAD  has  a  built-in  feature  to  help  determine  a 
prion,  in  a  non-Monte-Carlo  fashion,  when  the  performance  improve¬ 
ments  obtainable  with  MAD  are  large  enough  to  warrant  the  added 
computing  load 

II  Probi  I’M  Formi  I  AIKIN 

Consider  controlling  the  linear  system  described  by  the  input-output 
model  (4) 

i(r)-.4(</_  ')v(r -!)+«(</  ')u(r-  l)  +  e(r)  (2.1) 

where 

.<(<?-  ')-u,  +  Uj<?  '  +  ■  +  <r„q  •"  (2.2) 

B(q')-b,  +  b2q  '  +  *-b„q  .<*-'»  (2.3) 

'To  tb«  best  knowledge  of  the  author*,  put  comparisons  between  different  control 
algorithms  were  limited  to  sample  means,  leaving  open  the  question  of  statistical  signifi¬ 
cance  of  the  observed  differences 


are  polynomials  in  the  delay  operator  q  1  defined  by  q  'z(f)  —  2(1  -  I). 
The  system  output  is  >■(/).  the  input  is  ufr),  and  e(/)  is  a  zero-mean,  white 
Gaussian  disturbance  with  standard  deviation  Pan  or  all  of  the 
parameter  vector  defined  by 

•r«(a,<i2  a«M2  *.l  (2  4) 

is  unknown.  It  is  assumed,  however,  that  the  true  parameter  vector  9  is 
equal  to  one  of  M  known  constant  model  vectors  •  , /<■  1,  .Af.  with 
corresponding  known  a  priori  probabilities 

/>[»  =  *,  1=A,(0);  y-1.  ,M  (2.5) 

w 

2a,(0)-I  (2.6) 

/-i 

The  objective  is  to  determine  a  sequence  of  control  decisions  (u(0). 
u(  I ).  •  -  -  ,u(  \  —  I ) )  which  minimizes 

^0)-£[C(0)l  (2.7) 

where  the  cost  is  quadratic  about  a  reference  trajectory 

C(/)-  jfl(Af)|.v(.V  )->-,( .V))2 
i  v ; 1 

+  S  2  { ?(r)[y(T)->'r(T)]2-i-r(T)(u(T)-ur(T)]2}  (2.8) 

T—  f 

subject  to  (2.1)-(2.6).  The  expectation  in  (2.7)  is  performed  with  respect 
to  all  random  vanables  in  C(0).  with  the  quantities  q(t).  r(r),  y,(f).  and 
u,(r)  all  known  (time-varying)  constants,  r«*0, 1.-  •  ■  ,N.  The  information 
vector  at  time  i.  Z(i)  consists  of  the  sequence  of  known  outputs  and 
control  decisions 

Z(  r)  -  { y(0).  v(  1 ).  •  •  •  ,y  ( r),  u(0).u(  1 ),  ,u(  t  —  1 ) } .  (2.9) 

Given  that  an  admissible  control  decision  u(r)  is  a  function  of  Z(f)  as 
well  as  the  statistical  description  of  the  future  observations  (6),  the 
optimum  solution  to  the  problem  is  given  by  the  stochastic  dynamic 
programming  as  , 

u*(r)-arg  min£  |  j  ?<r)|  >  (i)-y,(/)]2+  |  r(f)[u(f)- u,(i)]2 

+y*(r+l.«(r)J|Z(r).ir(/)J  (2.10) 

where  J'\t  +  l.u(r)]  is  the  optimum  cost-to-go  from  i  +  I  to  the  end,  and 
is  a  function  of  the  present  control  decision  u(r).  The  globally  optimum 
control  cannot,  in  general,  be  computed — the  only  sure  way  of  avoiding 
the  “curse-of-dimensionality"  (II)  is  by  finding  a  recursion  in  the  cost- 
to-go,  which  here  does  not  exist  because  of  the  parameter  uncertainty. 
Several  computable  suboptimal  control  algorithms  for  this  problem  do 
exist,  however,  including  two  of  particular  interest  here.  They  are  the 
so-called  heunstic  certainty  equivalence  (HCE)  algorithm  [6],  and  the 
Deshpande-Upadhyay-Lainiotis  (DUL)  algorithm  f  13).  In  the  HCE  algo¬ 
rithm.  a  current  best  estimate  of  9  ts  computed  as 

W 

*')-  2  \(D9r  (2.11) 

j- 1 

0(f)  is  then  used  as  if  it  were  the  true  parameter  vector,  under  which 
assumption  the  optimum  control  is  easily  computed.  Thus,  in  a  heuristic 
manner,  certainty  equivalence  (though  untrue)  is  enforced.  In  the  DUL 
algorithm,  the  control  decision  is  obtained  as 

M 

u(0-  2  'V i )“,(')  (2  12) 

j- 1 

where  u;(i)  is  the  optimum  control  which  would  result  if  0-0,  were  in 
fact  true  (again  easily  computed).  Both  the  HCE  and  DUL  algorithms 
are  passively  adaptive;  they  do  not  assess  the  effect  which  the  current 
control  decision  will  have  on  future  learning.  HCE  and  DUL  are 
algorithms  of  the  feedback  type,  rather  than  of  the  truly  closed-loop  type 
as  defined  in  [6).  The  optimum  control  to  be  derived  by  approximation 
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of  (2.10)  is  a  closed-loop  control,  capable  of  taking  advantage  of  the  dual 
effect  |7|  of  the  control  in  this  problem. 

For  the  moment,  attention  is  focused  on  the  I  wo- model  (Af-2)  case 
since,  as  will  be  shown  in  Section  VI,  solution  of  the  general  Af-model 
case  (Af  >2)  ma>  be  obtained  by  solving  a  fixed  set  of  two-model 
subproblems  The  problem  of  pairwise  (Af  — 2)  model  discriminations 
will  be  shown  to  embody  the  basic  duality  of  the  control 

III  PRE  POSTERIOR  ANALYSIS  AND  THF  APPROXIMATE  SOLUTION 
OF  THE  STCX  HASTIC  DYNAMIC  PROGRAMMING  EQU  ATION 

Consider  the  case  Vf  —  2  with  the  probabilities  defined  in  (2  5) 

\,(0)  -  fl(0).  A,(0)-1-  11(0)  (3.1) 


£[./•(  M-l)|Z(/).u(f>| 

mm  {IId+l)£(C(/+l>|Z(/+l>.£(/+l).««0,J 

■'0  I(i+I) 

+  [! -n<r+lH£[C(/  +  l)|Z</+l). 

p[n(/-H)|Z(/).u<r)|cfn<r+I)  (3.10) 

where  p[ri(i  +  l)|Z(/).u(/)|  is  the  prepostenor  probability  density  func¬ 
tion  of  ri(/+l).  which  is  the  information  state  for  the  parameters  at 
r-F  1.  The  term  prepostenor  1 18]  means  that  this  is  the  prior  density  (with 
respect  to  time  (+1)  of  the  posterior  11(1+1),  conditioned  on  the 
information  at  i.  This  density  is  obtained  using  (3.9)  as 

p\  i  k  i  + 1  )|  z(  i ). «( / )]  =  |  ',^7^T))1 1  P\  >•( '  +  1  )l  ■ 2(  t  )•< «< i )] 


where  the  pnor  II(0|  is  known 

In  order  to  obtain  a  computationally  implementable  algonthm.  the 
cost-to-go  in  (2.10)  will  be  approximated  as  follows:  the  future  controls 
(for  /  >  t  +  1 )  will  be  assumed  of  fixed  structure,  they  will  be  of  the  DUL 
type  but  with  time-varying  probabilities  as  more  information  becomes 
available  to  the  controller  This  is  expressed  as  follows: 

£(■/•(  i  +  1  )!Z(  i).u(i)  ) 

=  £{  min  £[C(  I  +  1  )|Z(r  +  1 1.  f.(r  +  I  >||Z(t).u</> }  (3.2) 

1  i.ti*  i )  ' 

where  Lit  +  1)  is  the  set  of  all  parameters  in  the  controller  structure  from 
i  +  I  through  the  end.  Using  the  total  probability  theorem,  the  optimum 
cost-to-go  in  (3  2)  may  be  wntten  as 

•/•d+l>~  mm  | !!(/+  l)£(Cd  +  I  )'Z(  r  +  !)./.(;+  I).8-8,| 

1 1 1  ♦  1 1 

+  [1  -  11(1+  l)]£[C(r+  l)iZd  +  I )./.((+  I).0-*2|)  (3.3) 

where 

II(I+  l>  = />|8«0llZd  +  II)  (3.4) 

;s  given  by  Bayes'  rule 


I y,<r  + 1 )  -  »j(  t  +  i  )j  n(i  + 1  >[  i  -  n<  /  + 1 )] 


n(i)exp 


i  [  \2 _ n(o[i -nd-H)] 

2A:[  i\(r  +  1  )->•,(/  +  F)  "  i'l-n(«)|ri(r  +  l) 

2' 

-  jfr, </+!)•  s-2(/+l))  +  [  I  n ( / )Jexp 


i  f  a 2  mnU-iKr+uj 

2A:  j  vdr  +  l)  -- i  ,(r  +  1)  "  [  1  -  H< /)jri( f  -F  1 ) 

.  k 

+  ^ (yil'F  1 1 ->’}(»■*- 1))  |  (3.11) 


The  integration  required  in  (3.10)  is  still  net  feasible  to  perform,  even 
given  knowledge  of  the  exact  prepostenor  density  (3.11).  An  approxi¬ 
mate  solution  to  this  integration  is  obtained  by  taking  advantage  of  a 
fundamental  property  of  the  prepostenor  density:  as  the  signal-to-noise 
ratio 


SNR-  ^  [  y,(/*  I 


pi  >(/  +  |)  Z(r>  uli )  O«0||ll(/)  increases,  the  ability  to  disenminate  between  the  two  models  increases. 

11(1+1)-  -  --  j  •  f  i  /)  u(7|]  and  the  prepostenor  density  in  (311)  exhibits  a  distinct  bimodal  char¬ 

acter  (see  |9]);  most  of  the  density  becomes  concentrated  around  two 
p\  v(r  +  1  )|Z(  r ).  u(  M.8  =  8,  |I  1(  r )  distinct  locations,  say  11,(1+  1)  and  H,(r+  1).  In  the  limit  as  SNR-.0C, 

*  I  l'(r)/>[  r(/  +  l)iZ(r).u(r).e-8,']  +  |l  I!(r)|/>|  »(/  +  I )|Z(  O.udf*^]  p(n(r+  l)|Z(/).u(D)  becomes  the  weighted  sum  of  two  delta  (unctions 


lim  cin(r+  I)!Z(/).u(i)) 

SNR-.sc 

-n(i)«|ii(/  +  i)- 1]  +  [ i  -n(oi«iii(i  +  ui  (3.13) 

These  observations  suggest  using  the  following  approximate  prepostenor 
density. 

p|ll(r  +  l)!Z(r).u(i)l*II(!)8|Il(i  +  1 )  -  IV f  +  D| 

+  [1  -  I1</)]6(II(/+  1) -  II2(/  +  D]  (3.14) 


for  /- 1.2.  If  A  .B.  denote  the  polynomials  of  (2  2)  and  (2  3).  respec¬ 
tively  formed  assuming  9-8(  is  true,  then  1.7  7)  becomes 


where  the  delta  function  locations  II,(r+  1)  and  11.(1+  I)  satisfy 


v,<!+  l)-.4,i(f>  +  «,u(i)  (3.8) 

From  ( 3.5 1  one  can  obtain  the  inverse  transformation  from  lid  +  1)  to 
the  latest  observation  >•(/+  I) 


v(i+  I)-  j  ( -v' , ( f  +  I)  +  >'j(*+  D| 


_ *  _  ..  Inl  H(ii|l  II(i+  l)|  \ 

v,(/+l)-i:,</+l)  III  11(0)11(1+1)1  ' 


Hius.  the  outer  expectation  on  the  nght-hand  side  of  1 3.2).  which  is  over 
ni  +  I).  can  be  replaced  by  an  expectation  over  !!(/+  I)  as  follows: 


On  ll.li  »  I )  n  1 1(  i)  n  Il,(i  +  I)n  I  (3  15) 

The  locations  1 1 ,  t  /  +  1 )  and  II;(i+ll  may  be  obtained  by  moment 
matching  they  are  chosen  so  that  the  first  two  moments  of  11(1+ 1) 
produced  by  the  approximate  density  (3  14)  match  those  of  the  true 
density  <3.1 1)  Such  a  technique  has  been  used  with  success  in  |8].(9).  A 
simple  and  accurate  technique  to  carry  this  out  is  described  in  Appendix 
I 

While  the  approximate  prepostenor  density  has  now  been  established, 
evaluation  of  the  cost-to-go  in  (3  10)  still  requires  a  minimization  with 
respect  to  the  set  /.(/+  I)  of  (time-varying)  controller  parameters  from 
i  - 1  +  1  to  the  end  of  the  control  penod.  a  set  which,  of  course,  depends 
on  the  statistic  lid  +  I)  An  approximate  solution  to  the  minimization  in 
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(3.10).  which  is  easy  to  implement,  is  obtained  by  assuming  a  future 
sequence  of  DUL  type  controls  represented  by  £(l  +  I): 

£[J*(r+  l)|Z(/).u(r)l*J[/+  l.u</)| 

-Jo,{n(/+l)£[C(r+l)|Z(r+l).£(/+l).»-«l] 

+  |l-Il(/+l)]£[0(r+l)|Z(r+l).£</+l).8-82]} 

(n(f)«[n(/+i)-n,(f+i)]  +  (i-n(/)] 

•«[n(r  +  l)-n2(r+  l))}dn(/+  1).  (3.16> 

Performing  the  integration,  (3.16)  becomes 

y‘|t  + 1“(')] -ri(  on, (r+i)y„[r+  i.udi.tutM- 1  >.*-•,] 

+  n<oii-n,<r+i)y,2[r+i.u<n./.12u  +  i).»-e2] 

+[i  -n(o]n2(»+  n/2l[f+  i.u(r),  £,,</+ u.8-8,] 

+[i-n(f)](i  -n2(»+i)]y22[r+  i.u(o.£22(i+d.*-*2]. 

(3.17) 


£„</+  l)-{at(i)./fy</).r-/+ l}.  (4.4) 

The  control  gains  are  given  by  the  weighted  sums  [13) 

a, y(i)”H, ,(')««, (<)  +  [l  -Q/y(0la2(<)  (4.5) 


where  a,.a2  are  the  model-optimal  gains.  An  analogous  equation  yields 

K 

The  nominal  posterior  probability  that  the  controller  will  attach  to  the 
parameter  being  #  —  8,.  when  in  fact  it  is  8-8y.y-  1  or  2.  and  at  r  +  1  it 
started  with  n,(r+  1).  /—  I  or  2.  is,  using  (3.5) 


(4.6) 


i-r+1.  •  •  ,.V  -  2;  «,,(/+ l)-n,(r+l) 

where  the  “mismatched”  ( k  *j)  prediction  is 

9„(,  +  l>  -  £[y(,(<-H)|Zj(r).Ci,(i),«-e*] 

“  Ak>  /,( i )  +  Bk  ut(  i);  k*j  (4.7) 


Equation  (3.17)  represents  the  approximate  cost-to-go  resulting  from  a 
particular  control  choice  u(r).  The  nominal  sequence  of  control  parame¬ 
ters  L,(r  +  1),  l.j  —  1,2  consists  of  a  DUL.  weighted  sum  of  model  control 
gains.  This  sum  is  computed  with  nominal  weighting  factors  given  by: 

1)  ri(r  +  I)  —  ri,<r+  I)  as  the  initial  sufficient  statistic  for  8  at  r  +  I. 

2)  subsequent  nominal  posterior  probabilities  S2,((i)  that  8-8,  which 

evolve  as  i  —  t  +  2,  • ,  N  -  1  when  this  DUL.  control  is  applied  to  the 

system  with  8-8;. 

Note  that  the  model  control  gains  are  obtained  from  a  standard  linear 
quadratic  problem  with  known  parameters.  The  term  Jtj,  which  is  the 
corresponding  cost,  is  obtained  from  a  standard  recursion  for  a  known 
linear  system  with  8-  8;.  quadratic  cost,  and  a  given  set  of  control 
parameters  £,(r  +  1).  See,  for  example.  (10). 


IV.  The  Nominal  Sequence  oe  Future  Posterior 
Probabilities 

The  nominal  future  posterior  probabilities  Ji,y(<)  are  generated  by 
constructing  a  future  observation  and  control  scenario,  based  on  the 
statistical  information  contained  in  the  approximate  prepostenor  density 
function  (3.14).  This  density  indicates  that,  given  a  specific  control 
decision  u(r),  with  probability  Ll(r)  the  posterior  probability  n<r+  1)  will 
become  n,(r+l).  and  with  probability  (l-II(r)]  the  posterior  will 
become  n2(r  +  l).  Using  (3.9)  it  follows  that  the  observation  which 
would  produce  the  posterior  n</+  D  — ll;(r+  1).  /-  1.2,  is  given  by 


y,(  f  +  l) -  ^  [ y ,( t  +  1 )  +  >-2(  t  +  I)]  +  ; 


y2(r+  l)-  i  ,(r+l) 


•  In 


n(/Hi-n,(f+i)] 

[i-rnuiityi+n 


The  terms  LI,(r+  1).  v,(t+  1)  are  now  used  as  initial  conditions  at  i  —  r+  1 
for  a  nominal  future  observation  and  control  sequence;  nominal  outputs 
y0(i)  for  a  given  pair  (/,y)  are  generated  by  replacing  r(i)  by  its  mean, 
which  is  zero,  in  (2.1)  with  8-8y: 

>y,(i  +  1)- Aji/i)*  B^i);  i-r+l,  ,,V  -2.  y,y(r+ l)-y,(r+ 1). 

y- 1.2  (4.2) 

The  nominal  controls  u,/i)  are  generated  using  a  DUL  control  policy 

u0<  i) — a/y(  i)i(/(i)  +  fi0<  i):  i-r+l.  ,N-1  (4.3) 

where  x  is  a  suitable  state  vector  corresponding  to  (2. 1 )  and  represents 
the  future  nominal  state  corresponding  to  ytr  which  was  specified  above. 
The  set  of  control  parameters  is 


with  ZtAi)  the  nominal  information  vector  at  time  i. 

Equation  (4.6)  specifies  the  four  "learning  curves"  used  to  compute 
the  cost-to-go  (3.17)  in  order  v»  obtain  a  feasible  solution  to  the 
stochastic  dynamic  programmn..'  *  nation  (2.10).  Note  that  if  y  —  I  then 
!)(y  converges  toward  unity  for  t;  'th  /—l  and  2;  however,  because  of 
(3.15)  it  will  converge  faster  if  f-  1.  Conversely,  if  y  — 2,  then  U{J  con¬ 
verges  toward  zero,  again  for  both  /-  1.2,  but  faster  if  /- 2. 


V.  Some  Remarks  on  the  Properties  of  the  New  Algorithm 

From  (2.9).  (2.10),  (3.16),  and  (3.17)  it  can  be  seen  that  the  MAD 
control  at  time  t  is  obtained  by  numerically  locating  a  minimum  with 
respect  to  u(<)  of  the  cost  function 

J[r.u(r))-  ‘  r(r)|u(r)-u,(r))2  +  y‘[r-El,u(r)]  (5.1) 

where  J\t  +  l,u(r)J  is  the  approximate  cost-to-go  as  given  by  (3.17).  A 
golden  section  line  search  combined  with  a  quadratic  fit  [5]  may  be  used 
to  locate  uMAD(r).  where  the  HCE  and  DUL  algorithm  controls  are  used 
to  set  the  initial  control  search  window.  Computational  evidence  indi¬ 
cates  that  between  5  and  8  function  evaluations  J[t+  l.u(r))  (5-8  diffe¬ 
rent  values  of  u(r))  are  sufficient  to  achieve  high  accuracy  in  locating  the 
minimum. 

By  using  the  approximate  prepostenor  density  (3.14),  consideration  of 
the  possible  values  I"I(  r  +  I )  may  take  on  is  reduced  to  two  “most  crucial” 
values.  n,(r+  1)  and  n2(/+  I).  Equation  (3.17)  indicates  then  that  four 
possible  events  need  to  be  considered:  n,(f+l)  becomes  the  posterior 
with  the  true  system  8-8,:  11,(1+ 1)  becomes  the  posterior  but  8-82  is 
true;  n2(/+  1)  is  the  statistic  but  8-8,;  and  Il2(r+1)  occurs  with  the 
true  system  8-82  The  probabilities  of  these  four  events  occumng  are 
n(/)n,(r+l).  n<rXl-n,<r+l)).  (l-n<r»n2(/+!).  and  [l-ri(/)II- 
n2(r  +  l)J,  respectively.  The  cost  which  will  be  incurred  if  the  event 
described  by  the  {n,(r+  1),8;)  pair  happens  is  /j|r+  l.u(r),  £^<1+  1),8- 

Consider  now  how  realistically  represents  the  cost  of  such  an  event. 
First  assume  /— y:  for  example,  take  /—  y—  1.  Due  to  the  condition 
descnbed  by  (3.15).  the  output  y,((+  1)  given  by  (4.1)  with  f— 1  which 
would  produce  this  fl,(r+l)  would  more  likely  come  from  a  system 
where  8-8,  were  true.  Since  y  -  I  in  7,,  this  represents  convergence  of 
lift  +  I)-  n,(r  +  I )  >  Ll(r)  in  the  right  direction,  which  is  toward  unity. 
In  the  future  nominal  control  scenano  described  in  Section  IV  the 
probabilities  0,,(i)  will  then  converge  steadily  toward  unity,  since  the 
mismatched  predicted  observation  (4.7)  appears  as  a  negative  exponent 
in  (4.6).  If  /  —  y  —  2  the  exponent  in  (4.6)  is  positive  and  022(i)  converges 
to  zero.  Now  consider  what  happens  if  (+-y;  for  example,  if  /  —  2»y  —  1 . 
The  true  system  has  8-8,.  but  a  nominal  observation  y2(r  +  l)  occurs 
which  makes  n(/+  l)-n2(r+  l)<Il(r);  i.e„  ri(r+l)  goes  in  the  wrong 
direction.  In  the  subsequent  nominal  control  scenario  the  observations 
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_v2,(/)  come  from  the  true  system  with  8-0,.  and  thus,  of  course,  the 
postenors  021(i)  will  recover  from  the  "bad"  initial  Sl2,< I  +  l)«n2(f  +  1). 
but  only  aftet  some  time.  Meanwhile,  the  control  gams  £ 2 ,  ( r  +  I)  have 
been  closer  to  the  optimum  gains  of  the  wrong  system  (0  —  02),  thus 
accumulating  an  added  cost  represented  by  Ju.  Similar  statements  may 
be  made  about  the  event  /“I,  j  —  2.  Thus,  the  cross  terms  l+j, 
represent  the  costs  incurred  if  learning  is  degraded  by  bad  observations 
at  /+  1.  Correspondingly,  it  is  expected  that 

an^  ^12^^22  (5.2) 

on  a  range  of  control  values  containing  the  HCE.  DUL.  and  MAD 
controls.  Computational  evidence  in  Section  VII  indicates  that  this  is 
indeed  so 

Thus,  the  algonthm  MAD  is  sensitive  to  the  anticipated  rate  of  future 
learning,  and.  if  needed,  its  present  decision  will  affect  that  learning  rate 
appropriately. 

The  evolution  of  the  information  about  the  system  during  the  process, 
described  in  detail  in  the  previous  two  sections,  can  also  be  summarized 
in  pictorial  form  as  in  Fig  1  The  current  probability  TI(/)  evolves  into 
one  of  the  two  values  FI,  or  112  from  which  four  “learning”  curves 
follow.  These  curves  are  labeled  lj  and  they  correspond  to  the  four  cost 
components  from  (3.17).  This  is  the  essence  of  the  novel  approach  that 
yields  the  closed-loop  (7)  approximation  of  the  stochastic  dynamic  pro¬ 
gramming  presented  here. 

VI  Thf.  General  M-Model  MAD  Control  Algorithm 

Extension  of  the  two-model  MAD  algorithm  described  in  Sections 
III  -  V  to  include  the  general  case  of  M  models.  Af  >  2.  is  now  discussed. 
It  will  be  shown  that  the  M-model  MAD  algonthm  consists  of  perform¬ 
ing  two-model  cost  computations  for  each  of  the  distinct  pairs  of  models 
using  the  two-model  MAD  algonthm.  along  with  one-model  optimum 
cost  compulations  for  an  appropriate  adjustment. 

To  begin  the  development,  consider  first  the  case  Af  -  3  Define  H',. 
W2,  and  as  the  three  mutually  exclusive  and  exhaustive  events  8-8,. 
0-8 j,  and  0-03  true,  respectively.  Then  the  mixed  probability  expres¬ 
sion  [25J  p[ >,  (C,u  fCju  ffj],  where  J  is  a  random  vanable,  can  be 
wntten  as 

p[J.  »',U  K'iU  W2\-p[J.  H'.U  K'll  +  Pl-f.W^U  W2)+p\J,  W2<j  Wy] 

-p[J.  “M  -pM.  W, \-p[J,  \Vi\-p\J]  (6.1) 

where  the  union  W*  u  signifies  the  event  that  one  of  9k.9J  is  the  true 
parameter  vector,  and  where  W',  u  fT2u  H',  is  the  sure  event  (note  that 
IT,  n  lf'2n  W'  j-0,  the  null  set).  Using  (6.1)  for  a  cost-to-go  J(t  +  1),  one 
can  write 

p[J(i+  l)|Z(r).u(/)]-[.\l(/)  +  Aj(/)Jp|y(r+l)|Z(i),i»(r),  H',u  ITj} 

+  [A,(l)-f  AJ(/)]p|y(f+l)|Z(»).«(0.»T1u 
+  [A2(/)  +  A,(/))p[y(/-fl)|Z(/>.u</).»-2u  »',] 

-A,(f)p(y(/+  I)1Z(I).U(0.  *T,1-  A2(Oply(f-t-l)|Z(l).u(t),»T2) 

-  A,(/)p[/(»+l>|Z(/>.u</>.H'J).  (6.2) 

From  (6.2)  it  follows  that 

£(/(/+  l)|Z(t).u(/))  — |A,(r)  + A2(»)]£[y(t+  l)|Z(r),u(t).  IT.u  1V2] 

+  (A,(f)  +  Aj( t )]£[£(/  +  l)|Z(i).u(r), H-', u  If',) 

+ 1  A2(t)  +  Aj(f ))£[£(  r+  l)|Z(r).u(  r).  W'2u  ITj) 

-  A,(»)£(/(r+l)|Z(f).u(/).»*'l|- A2(r>£f/(/  +  1  )|Z( t ). u(  r ).  lf2] 

-  Aj( r)£(y<  r l)|Z(r).u(/).H',).  (6.3) 

Now.  for  arbitrary  Af  a  2  it  can  be  shown  (see.  eg,  f!7|)  that 

£(7(1+  l)|Z(/).u(f)] 

AT  I  Af 

“2  2  [A,<r)-fA/r)l£[y(/-H)|Z(r).«(/).»T,u»Tj 

i-l  y-»  +  ! 

M 

- ( A/  - 2)  2  A,<f)£|y</+I):z<t).u(f).^).  <6 4) 


0  -  — 


F» |  I  "Learning  curves”  for  evaluation  of  the  cost-to-go 


Equation  (6.4)  states  the  following. 

Theorem:  For  a  specified  u(i )  the  cost-to-go  J{  l  +  1),  given  that  one  of 
Af  models  0y,  j  -  1.-  ■  ,  Af  is  correct,  can  be  obtained  as  follows. 

1)  First  compute  the  cost-to-go  which  results  if  one  of  either  0*,0,, 
k*j,  is  true;  this  is  done  for  each  of  Af-  Af(Af-  l)/2  distinct  model 
pairs. 

2)  Compute  the  optimum  model  costs  (9]  true),  j  —  1,  •  •  ,Af  and  form 
the  overall  cost-to-go  according  to  (6.4). 

Of  course,  all  the  expectations  in  (6.4)  are  conditioned  on  the  same 
information  state  Z(r)  and  control  choice  u(t).  The  model  costs  are 
easily  computed  from  a  standard  linear  quadratic  problem.  For  each  of 
the  two-model  costs  £[/((+  l)|Z(r),u(r),  IVk  u  WJ  an  approximate  cost 
is  computed  using  the  MAD  algorithm  of  Sections  III— V.  Since  the 
event  Wk  u  means  that  either  9k  or  0,  is  true,  the  required  sufficient 
statistic  IT kJ(t)  in  the  two-model  MAD  cost  evaluation  for  the  specified 
pair  of  models  is 


fl  '“(/)- 


A»(» 

At(r)  +  A/r) 


(6.5) 


thus  maintaimng  proper  normalization. 

The  general  Af -model  ( Af  >  2)  MAD  algorithm  thus  searches  for  a 
minimum  in 


A >, ')] -  jr(/)lu(/)-ur( i)]2  +  j 1 1  +  1 , i/( 0)  (6.6) 

where  J[t  +  l.u(r)]  is  given  by  (6  4). 

VII.  Numerical  Examples 

In  the  numerical  studies,  attention  was  focused  on  studying  the 
performance  and  characteristics  of  the  MAD  algorithm  for  the  case 
Af  «  2,  since  the  pairwise  model  discrimination  procedure  constitutes  the 
very  essence  of  the  actively  adaptive  decision  making  process  of  the 
algorithm.  Performance  will  be  compared  with  that  of  the  passively 
adaptive  HCE  and  DUL  algorithms. 

Example  I  A  second-order  system  (it -2)  is  considered  with  two 
poles  at  0.7.  It  is  not  certain  whether  the  true  system's  zero  is  at  -0.225 
or  at  -  0.9,  Correspondingly,  the  true  system  parameter  vector  is  one  of 
the  following: 


•MU  -0.49  2  0.45)  (7.1) 

8f  ”  l  14  -0.49  1  0.9]  (7.2) 

which  are  considered  a  priori  equtprobable.  The  initial  output  isy(0)— 
0. 1  and  it  is  desired  to  make  it  follow  over  N  —  5  tune  steps  the  reference 
trajectory  (for  f-0. 1,  ■  ,5) 

JV“i0.1  0.5  I  2  2.5  10].  (7.3) 

The  corresponding  weightings  in  the  cost  (2.8)  for  i  -  0. 1  ,  ■  •  ■ ,  5  were 
chosen  as 

*-(0  12  3  5  50]  (7.4) 

No  penalty  was  attached  to  the  control.  Note  that  this  would  be  a 
straightforward  minimum  variance  controller  about  a  desired  output  if 
the  parameters  were  known  (4).  The  process  noise  standard  deviation 
was  chosen  as  X-  1.5. 

A  Monte  Carlo  simulation  procedure  was  conducted  to  compare  the 
performance  of  the  MAD  control  algorithm  with  the  performances  of 
the  HCE  and  DUL  algorithms,  when  each  is  applied  to  this  problem. 
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TABLE  I 

Sample  Average  Costs  and  Standard  Deviations  for 
Example  1 


Algorithm 

OPT 

hCE 

Sample  Mean 

63.8 

154 

Sample  standard 
deviation 

n  2 

233 

TABLE  III 

Sample  Average  Costs  and  Standard  Deviations  for 
Example  2 


OPT  HCt  Oil  HAS 


60.5 

- 1 

364 

331 

109 

73.2 

44: 

434 

_ 1 

137 

Statistical  tests  were  made  on  the  results  of  200  independent  Monte 
Carlo  runs.  Each  of  the  200  sets  of  disturbances  was  used  to  generate  a 
run  for  each  of  the  three  control  algorithms  examined.  For  ri(0)  •  200 — 
100  runs,  the  true  parameter  vector  was  set  at  •  -•,  and  for  (1  -11(0)1- 
200-  100  runs  it  was  set  at  *-8j.  Sample  means  and  variances  of  the 
Monte  Carlo  costs  C,  defined  by  (2.8)  were  computed. 

Table  I  contains  the  results.  The  column  labeled  OPT  is  the  perfor¬ 
mance  for  the  same  disturbances  when  the  optimal  control  with  8  known 
is  used.  This  table  gives  the  first  indication  of  the  improvement  MAD 
gives  over  HCE  and  DDL  both  in  mean  cost  reduction  and  reduction  in 
the  variability  of  the  performance. 

Note  that  Table  I  does  not  provide  a  rigorous  argument  that  the  actual 
performances  (expected  costa)  are  ordered  as  the  sample  means  indicate. 
Appendix  II  presents  a  rigorous  statistical  test  that  provides  the  answer 
to  the  question  of  whether  the  expected  values  of  the  costa  are  different. 

To  carry  out  this  test,  three  new  data  sequences  are  formed  by  taking 
the  differences  of  the  cost  samples  generated  using  the  same  random 
variables  for  each  of  the  methods  HCE.DUL.MAD.  That  is 


ApD_CHCE_CDUL 

(75) 

AHM  .  £>HCE  _  £•  MAD 

(7.6) 

^DM  .  £>DUL  “  CMAD 

(7.7) 

TABLE  11 

Statistical  Test  for  Algorithm  Comparisons  for  Example  1 


Algorithms  I 
compared  i 


HCE  -  DUl  I 


‘  ’est 
statistic 


HCE  -  HAD  *  SO. Si  '3  6' 


DUl  -  HAD  I  47  31 


TABLE  IV 

Statistical  Test  for  Algorithm  Comparisons  for  Example  2 


Algorithms  ! 
compared  j 


Test  E s r  natt-d 
Statistic  inprovemert 


16 

5.30  | 

59 

lor  i  — l,-  •  ■  ,200.  The  sample  means  3  of  the  differences  and  their 
standard  deviations  <tj  for  the  various  algorithms  are  given  in  Table  11. 

Assuming  that  a  hypothesis  can  be  accepted  only  if  the  probability  of 
error  (level  of  significance)  a  is  less  than  5  percent,  i.e.,  the  confidence 
(1  -  a)  is  at  least  95  percent,  the  threshold  against  which  we  compare  the 
test  statistic  A/oj  is  it- 1.65.  The  test  statistic  has  to  exceed  the 
threshold  in  order  to  accept  the  hypothesis.  The  conclusions  that  can  be 
drawn  for  this  problem  from  Table  II  are  the  following. 

1)  The  hypothesis  that  DUL  is  better  than  HCE  cannot  be  accepted. 
The  estimated  improvement  of  2  percent  is  not  statistically  significant 
(o— 30  percent  is  too  large  a  probability  of  error  to  accept  that  DUL  is 
better  than  HCE). 

2)  The  hypothesis  that  MAD  is  better  than  HCE  is  accepted  (actually 
with  99.99  percent  confidence).  The  estimated  improvement  (decrease  in 
cost)  of  33  percent  is  statistically  significant. 

3)  The  hypothesis  that  MAD  is  better  than  DUL  is  accepted  (actually 


with  99.87  percent  confidence).  The  estimated  improvement  of  31  per¬ 
cent  is  statistically  significant. 

Note  that  MAD  has  gone  about  55  percent  of  the  way  between  DUL 
and  OPT;  the  latter  is,  however,  an  unachievable  lower  bound  because  it 
assumes  the  parameters  known.  The  Bayesian  optimal  controller  for 
unknown  parameters  (obtained  from  the  stochastic  dynamic  program¬ 
ming)  is  somewhere  between  OPT  and  MAD.  Thus,  MAD  seems  to  have 
gone  “most  of  the  way”  towards  the  Bayesian  optimum. 

Example  2:  This  example  is  the  same  as  the  first  one  except  lor  (he 
cost  weightings,  which  are 

*-[0  111  5  50]  (7.8) 

and  the  reference  trajectory 

y,-[0.l  0.5  1  2  0.1  10].  (7.9) 

The  resulting  average  cost  and  standard  deviations  from  200  Monte 
Carlo  runs  are  shown  in  Table  III. 

Table  IV  indicates  the  following. 

1)  The  hypothesis  that  DUL  is  better  than  HCE  is  accepted.  The 
estimated  improvement  of  16  percent  is  statistically  significant  (a <0.1 
percent). 

2)  The  hypotheses  that  MAD  is  better  than  both  HCE  and  DUL  are 
accepted  (a  <0.001  percent). 

Also  note  that  MAD  reduces  by  50  percent  the  cost  incurred  with 
DUL,  based  on  the  200  Monte  Carlo  runs. 

Next,  the  learning  properties  of  (he  above  algorithms  are  illustrated  by 
presenting  further  results  from  the  simulations  of  Example  2.  Table  V 
shows  in  the  first  part  the  evolution  in  time  of  the  posterior  probability 
that  •  -• ,  (averaged  over  100  runs)  when  the  true  system  had 
These  probabilities  all  tend  to  unity  but  the  active  learning  feature  of 
MAD  causes  its  probability  to  converge  faster.  Thus,  active  probing,  the 
need  for  which  is  realized  only  by  MAD,  pays  off.  The  second  part  of 
this  table  presents  the  corresponding  results  for  the  case  •— true, 
where  convergence  to  zero  (as  required)  is  again  faster  for  MAD. 

The  need  for  active  learning  as  sensed  by  MAD  is  illustrated  in  Table 
VI.  For  various  possible  values  of  the  control  al  period  I,  the  MAD 
algorithm  evaluates  the  future  learning  opportunities.  For  u(l)-4.3,  the 
preposterior  density  characterized  by  II,  and  na  indicates  that  not 
enough  learning  will  take  place:  the  contribution  of  y2l  (which  is  the 


N\S’ %*  v’ <*.N*  -.'V 
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TABLE  V 

Evolution  of  P  {•-•,|2(i)} 


TABLE  VI 

Cost  Breakdown  and  Learning  for  MAD 


uf1) 

Jin 

,1,1?) 

j.if» 

3,2(2) 

3„m 

4.300 

127.4 

0.9398 

0.05919 

64.07 

67.92 

2090 

66.39 

4,5001 

112  0 

0.9473 

0.05185 

64.78 

67.38 

1787 

65.87 

4.700 

6)  96 

0.9540 

0.04518 

65.65 

67.10 

714.2 

65.39 

4.90C 

69.74 

0.9602 

0.03917 

66.68 

66.96 

181.7 

64.94 

440  5.0971 

63  o5 

0.9655 

0.03388 

67.85 

66.92 

115-0 

64.55 

5 . 3001 

63.97 

0.9704 

0.02910 

69.22 

66.98 

112.0 

64.18 

5.500 

69  53 

0.9745 

0.02502 

70.73 

67.13 

118.3 

63.86 

5.700 

70.20 

0.9781 

0  02154 

72.40 

67.37 

126.4 

63.58 

5.900 

70.97 

099H 

0.0JS59 

74.23 

67.72 

135  1 

63  34 

Trjf  j 

r  - 

r  tw  | 

S.I 

r 

oul 

•4A0 

3500 

0.5  0: 

500 

:  soc  0  500 

0  500 

j  .  : . 504 

vses 

vr 

.  4r6  1  497 

C  500 

<  :  c 

0.5*' 

3::  0.181 

0  331 

'}  0  c5? 

0  ’.’6 

’96 

j:2  0  265 

0  224 

4  C  743 

2  J  ■  4 

842 

mi 

0.3901 

result  of  a  mismatched  controller  that  does  not  learn  fast  enough  what 
the  true  system  is)  to  the  cost  makes  J  (see  (5.1))  large.  For  larger  u(l), 
the  learning  is  faster  but  after  a  point  its  price  exceeds  the  benefit. 

Examination  of  Table  VI  also  gives  valuable  insight  into  the  problem 
of  determining  when  there  is  value  in  using  an  actively  adaptive  con¬ 
troller  like  MAD:  when  the  penalty  for  mismatched  controllers  is  large 
and  the  contribution  10  the  cost  is  significant. 

VIII.  Summary  and  Conclusions 

The  concept  of  preposterior  analysis  has  been  successfully  used  to 
derive  an  approximation  to  the  stochastic  dynamic  programming  equa¬ 
tion  for  the  control  of  systems  with  discrete-valued  random  parameters. 
The  resulting  algorithm,  called  model  adaptive  dual  control,  is  the  only 
actively  adaptive  controller  for  this  class  of  systems.  A  rigorous 
methodology  for  comparison  of  control  algorithms  has  been  presented 
and  used  to  show  that  the  new  actively  adaptive  controller  yields 
statistically  significant  performance  improvement  over  two  state-of-the 
art  passively  adaptive  controllers.  The  question  of  when  it  is  worthwhile 
to  use  an  actively  adaptive  controller  (which  is  relatively  expensive) 
versus  a  passively  adaptive  one  has  been  also  addressed.  While  Monte 
Carlo  studies  combined  with  the  appropriate  statistical  analysis  tech¬ 
niques  are  the  best  tool,  a  decomposition  of  the  cosl-to-go  can  be  utilised 
to  assess  inexpensively  whether  one  can  expect  a  significant  improve¬ 
ment  when  using  this  actively  adaptive  control  versus  a  passive  one. 
Based  on  our  experience,  the  class  of  problems  in  which  one  can  expect 
benefit  from  using  an  actively  adaptive  control  is  where  there  is  heavy 
terminal  state  penalty  and  the  control  period  is  relatively  short,  i.e., 
passive  learning  does  not  suffice  and  there  is  opportunity  and  need  for 
active  learning.  In  general,  active  adaptation  can  be  expected  to  improve 
the  transient  behavior  in  adaptive  control  by  speeding  up  the  adaptation 
process. 


Appendix  I 

Moment  Matching  for  the  Approximate  Pre posterior 
Density 

The  moment  matching  technique  used  to  obtain  II^r+1),  /- 1,2,  in 
the  approximate  prepostenor  density  (3.14)  is  now  described.  First 
consider  finding  the  true  moments  £|ri(r  +  l)|Z(r),u(r)],  £(nJ(r  + 
l)|Z(r).u(r)J  -  n2(r+  I).  From  the  fundamental  theorem  of  expectation 
and  (3.5) 


£[Il(f+l)|Z(/).u(r)]-Il<f)  (A.l) 

n2(i  +  l)  must  be  obtained  by  numerical  integration  using  either  (3.11) 
or  (3.5)  combined  with  p(y(r+ l)|Z(r).u(f)|.  The  latter  approach  lends 
itself  to  a  particularly  simple  and  accurate  integration  procedure.  Thus, 
take 

rP(r+l)-  fX  n2(>’(r+  I ))/>(. v(r+  l)|Z(r).u(r)]4>(r+  I).  (A.2) 

J  -  oc 


n(»+i)-| 

+(- ir '<*(/+ d])) 

>—  1  or  2  (AJ) 

where 

«*+!>-*(»+ I)-j?i(i+I) 

(A.4) 

and 

<,(»+l)-«!+l)[y(r+!)-;y(i  +  l)].  (A.3) 

Using  (A.3)-(A.5)  and  (3.5),  the  integration  (A.2)  may  be  shown  to 
reduce  to 


nl(r+l)-V2X£(,+  i){n(i)J"/l(T1)e-^rfr1 

+  [l-n(r))/“/,(TJ)e (A.3) 


where 

^(r+1) 

Tjm  - - - 

V2Xf(»+l) 


(A.4) 


and 

//V-(|  + 


i-n(Q 

n(») 


2AJ 


[2V2  *Ty  +  (-l/~'{(r  +  l)]}  }  1  >-1,2.  (A5) 

The  integrals  in  (A.3)  reduce  to  simple  finite  length  summations  through 
the  use  of  Hermitian  quadrature  (12);  this  technique  is  described  by 

f°CAr)e-r2</r~  1  Hfa)  (A.6) 

■'-»  i-l 


where  the  x,  are  the  zeros  of  Hennite  orthogonal  polynomials  and  the  H, 
are  the  respective  Hennite  coefficients.  The  Ht,xj  are  well  tabulated  (12)- 
The  number  of  terms  /f  in  the  expansion  is  chosen  large  enough  to 
achieve  desired  accuracy  in  (A.6). 

Using  (A.6),  (A.3)  becomes 

_ 

n^r+o-v^MU+n 2  ^{ndj/.ui+li-ncoi/jOt,)} 

1-  I 

(A.7) 

Equating  (A.l)  and  (A.7)  to  the  respective  moments  produced  by  (3.14) 
gives 


nio-iKon.d+D+ii-rKoinjd+i)  (A.8) 

IP(r-H)-n(r)nf(f+i)+[i-n(/)jnj(r+i).  (a.») 


Now  note  from  (3  5)  that 
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Using  inequality  (3.15).  (A.8)  and  (A.9)  then  yield  the  desired  delta 
function  locations 

n,(/+i)-n</>+  |(i-n(/)j  *  -n«)  J  (a.io> 

n,(t+»-  ,  ”no3 Ii i)l  (aid 


Appendix  II 

Statistical  Significance  in  the  Comparison  of  Controller 
Performances 

Suppose  that  a  Monte  Carlo  simulation  is  performed  to  compare  two 
control  algorithms.  The  corresponding  expected  costs  are  y'*1  and  J(i).  If 
S  independent  runs  are  made  with  the  first  algorithm,  this  yields  S 
independent  samples  C,1"  from  a  distribution  with  the  true  but  unknown 
mean  y(l>.  If  the  same  random  variables  that  entered  into  the  Monte 
Carlo  runs  with  the  first  algorithm  are  used  to  generate  S  runs  with  the 
second  algorithm,  this  yields  5  samples  C,IJ|  from  a  distribution  with  the 
also  unknown  mean  y<2>. 

The  sample  means 

1  s 

Cwl--f  S  cy1  J— 1,2  (B.l) 

o  ,_| 

are  eslimates  of  the  corresponding  performances  (true  means).  A  state¬ 
ment  that 

C<"<C«2'  (B.2) 

implies  that  algorithm  1  is  better  than  2  must  be  qualified  by  a  probabil¬ 
ity  a  of  error  type  I  [16]. 

Thus,  the  statistical  test  needed  is 

H0:  d  —  0  (algorithm  I  not  best)  (B.3) 

H,  A-y<2'-y<">0  (algorithm  1  best).  (B.4) 

The  probability  of  error  (also  called  level  of  significance)  a  is  defined 


a  -  /’{accept  Ht\H0  true}.  (B.5) 

Then,  since  acceptmg  H,  means  rejecting  Hq.  the  lower  a  is  the  less 
“significant"  H0  is.  Thus,  when  we  accept  H,  with  a  small  a  we  are  more 
confident  in  H,  being  true. 

The  test  is  carried  out  by  examining  the  set  of  independent  samples 

A,-CJ2’-CJ"  (B.6) 

as  to  whether  their  true  mean  A  can  be  accepted  as  being  positive  with 
high  confidence  (low  a).  Assuming  5  large  enough,  the  hypothesis  Hy  is 
accepted  if 

(B.7) 


(B.8) 

a  i-i 

i  (B.9) 

S‘  i-i 

and,  in  view  of  the  central  limit  theorem,  n  is  taken  from  the  normal 
distribution  tables.  For  example,  for  p»l.65,  a  -  5  percent  and  for 
»i-2.33,a«I  percent.  The  corresponding  confidence  in  the  statement 
that  algorithm  I  is  superior  to  2  is  then  1  -  a. 
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Stochastic  Dynamic  Programming: 
Caution  and  Probing 

YAAKOV  BAR-SHALOM,  senior  member,  ieee 


I  h'trm  i  —  I  lit-  purpose  of  this  paper  is  lo  units  the  concepts  of  caution 
and  probing  pul  forth  bv  Kcldhaum  [  141  with  the  mathematical  technique  of 
stochastic  dvnamic  programming  originated  t>\  Bellman  [5).  The  decom¬ 
position  of  the  expected  cost  in  a  stochastic  control  problem,  recenth 
developed  in  18].  is  used  to  assess  quantitative!)  the  caution  and  probing 
effects  of  the  s\  stem  uncertainties  on  the  control.  It  is  shown  how  in  some 
problems,  because  of  the  uncertainties,  the  control  becomes  cautious  (less 
aggressive)  while  in  other  problems  it  will  probe  (bv  becoming  more 
aggressive)  in  order  lo  enhance  the  estimation  identification  while  control¬ 
ling  the  svstent.  f  ollowing  this  a  classification  of  stochastic  control  prob¬ 
lems  according  to  the  dominant  effect  is  discussed.  This  is  then  used  to 
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point  out  which  are  the  stochastic  control  problems  where  substantial 
improvements  can  be  expected  from  using  a  sophisticated  algorithm  versus 
a  simple  one. 

I  Introduction 

THIS  PAPER  reviews  recent  work  in  the  area  of  stochas¬ 
tic  control  and  shows  how  the  concepts  of  caution  and 
probing,  originated  by  Feldbaum  [14).  can  be  unified  with 
Bellman’s  dynamic  programming  technique  (5).  [6).  The 
concepts  of  caution  and  probing,  developed  by  Feldbaum 
(14)  about  20  years  ago  and  also  discussed  in  (16).  dealt 
from  an  intuitive  point  of  view  with  some  phenomena 
peculiar  to  stochastic  control  problems  or  decision  under 
uncertainty. 
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In  the  presence  of  uncertain^,  modeled  by  random 
variables  or  stochastic  processes.  there  is  usually  a  de¬ 
terioration  of  the  system  performance,  which  can  be  mea¬ 
sured  bv  an  increase  in  the  (expected)  loss  function  com¬ 
pared  to  the  deterministic  case.  In  order  to  reduce  the 
increase  in  the  loss  function  the  controller  will  tend  to  be 
"cautious."  a  property  known  in  the  decision  theorv  litera¬ 
ture  as  "risk  aversion"  [12].  This  phenomenon  vvcurs  for 
convex  loss  functions  that  the  decision  maker  (controller) 
wants  to  minimize,  like  in  most  control  problems.  On  the 
other  hand,  in  multistage  problems  where  observations  are 
made  on  the  svstem  at  each  stage,  the  controller  might  be 
able  to  carry  out  what  has  been  called  "active  information 
gathering"  or  "probing"  of  the  svstem  for  estimation  en¬ 
hancement.  This  is  possible  when  the  controller  affects  not 
only  the  state  of  the  system  but  also  the  qualuv  of  the 
estimation  process,  i.e..  has  the  so-called  "dual  effect." 

This  paper  intends  to  prov  ide  a  tutorial  on  these  aspects 
of  stochastic  control  bv  a  suitable  presentation  of  the  basic 
concepts  embodied  in  the  stochastic  dynamic  program¬ 
ming.  When  the  caution  and  probing  phenomena  are  pre¬ 
sent  m  the  multistage  problems,  the  optimal  solution  is  not 
known.  In  view  of  this,  the  insight  is  provided  bv  consid¬ 
ering  a  subopiimal  algorithm  that  has  the  features  of  the 
optimal  one. 

Section  II  discusses  the  information  state  in  the  multi¬ 
stage  control  problem  of  a  stochastic  system.  The  formula¬ 
tion  of  ihe  principle  of  optimality  for  stochastic  svstems 
and  the  resulting  stochastic  dynamic  programming  equa¬ 
tion  for  additive  cost  functions  are  discussed  in  Section  III. 
It  is  pointed  out  how  the  "preposterior  analysis"  technique 
is  a  direct  consequence  of  the  principle  of  optimalitv.  The 
definition  of  the  dual  effect  and  the  types  of  approximate 
solutions  of  the  stochastic  dynamic  programming  are  the 
topic  of  Section  IV.  The  "closed-loop"  approximation  of 
the  stochastic  dynamic  programming  using  the  "wide-sense" 
information  slate  [«].  [29j.  (30)  is  shown  in  Section  V  to 
lead  to  a  decomposition  of  the  expected  cost  into  three 
terms.  I  wo  of  these  terms  can  be  associated  directlv  with 
the  caution  and  probing  effects  discussed  earlier  giving 
thus  a  quantitative  measure  of  these  effects.  It  is  shown  in 
Section  VI  how  one  can  classify  stochastic  control  prob¬ 
lems  according  to  the  dominant  term  in  the  cost  decom¬ 
position.  This  is  then  illustrated  via  a  number  of  examples 
where  stochastic  control  problems  that  are  probing- 
dominated.  caution-dominated,  and  essentially  determinis¬ 
tic  are  presented.  The  effect  of  various  state  weightings  in 
the  cost  function  and  the  anticipated  future  learning  are 
also  discussed.  Conclusions  are  presented  in  Section  VII. 

II  Tut  Information  Siam  in  a  Skk  hasik 
Con  i  koi  Probi  i  m 

Ihe  principle  of  optimalitv  of  Bellman  |s|  can  be  stated 
as  follows  for  stochastic  problems;  at  anv  time,  whatever 
the  present  information  and  past  decisions,  the  remaining 
decisions  must  constitute  an  optimal  policy  with  regard  to 
the  current  information  set. 

In  the  deterministic  case  the  information  set  is  the  state 
of  the  system.  This,  together  with  the  controller's  subse¬ 


quent  decisions  fully  determines  the  future  evolution  of  the 
svstem.  In  the  stochastic  case  the  information  set  is.  loosely, 
what  the  controller  knows  about  the  system.  This  will  be 
discussed  in  more  detail  next. 

Consider  the  following  general  stochastic  control  prob¬ 
lem  The  state  v  evolves  according  to  the  equation 

v(A  +  I )  ~  /  [ A  .  v( A  ).u(k  ).  t  ( A  )j  A  =0.1. 

(2.1) 

where  u  is  the  control  and  r  is  the  process  noise.  The 
measurements  are  described  by 

y(  A  )=--/«[  A- .  v(A).h(A)]  A  =  1 ,  •  (2.2) 

where  u  is  the  measurement  noise.  The  information  set  at 
time  K  is  assumed  to  be  the  past  measurements  and  con¬ 
trols 

/*  =  {Yk.Lk')Dlk'  (2.3) 

w  here 

{»•</)}*„.  i =  ,  (24) 

and  subscript  /  0  is  omitted.  The  inclusion  property  in 
(2.3)  points  to  the  fact  that  the  sequence  of  information  as 
assumed  here  is  nested  each  contains  its  predecessor. 

Since  (2.3)  is  growing  with  A  it  is  of  interest  when  a 
(nongrowing)  information  state  can  replace  (2.3). 

Note  that  v(A)  is  a  stale  only  in  the  deterministic 
context  when,  together  with  it  fully  determines  x(j). 
V / > A .  i.e..  v(A)  summarizes  the  past  of  the  system.  The 
stochastic  counterpart  of  this  is  the  “information  state." 

The  information  state  is  defined  as  a  vector-valued  vari¬ 
able  or  a  function  that  summarizes  the  past  (i.e..  it  can 
replace  ll )  when  we  want  to  characterize  (probabilistically) 
the  future  evolution  of  the  system.  This  is  more  general 
than  the  "informative  statistic"  of  Striebel  [26]  which  is. 
roughly,  what  the  optimal  controller  (for  the  problem 
under  consideration)  needs  from  the  past  data  (2.3). 

It  is  assumed  in  the  sequel  that  all  the  pertinent  proba¬ 
bility  densities  exist.  Discrete-valued  random  variables  will 
have  a  probability  density  function  (pdf)  with  Dirac  delta 
functions  at  the  locations  of  the  point  masses. 

If  both  sequences  of  process  and  measurement  noises  are 
white  and  mutually  independent,  then  at  time  A  the  condi¬ 
tional  probability  density  function  of  the  vector  v(  A  ) 

>‘--/-[.v(A)|/*]'  (2.5) 

is  an  information  stale.  This  can  be  seen  from  the  follow¬ 
ing.  The  conditional  density  of  v(A+l)  can  be  written 
using  Baves'  rule 

r  p [  v ( A  4  1)|  /*  ']  ~  /*[ >’( A  +  1  )|.v(  A  +  1 ).  /*.  m(  A  )] 

/>[v(A+l)|/\«(*)]  (2.6) 

where  r  is  a  normalization  constant. 

'Rigorously,  ihe  conditional  density  should  be  written  p\  }'*;(  *  '] 
because  this  is  conditioned  on  the  sigma-algebra  generated  bv  the  mea¬ 
surements  but  it  is  not  well-defined  unless  the  values  of  past  controls  or 
control  functions  are  indicated  |2fr|.  For  k  0  this  is  the  pnor  densitc  of 
the  state 
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If  the  measurement  noise  is  white  (  u(  A  +  1 )  conditioned 
on  \(k  -  I)  has  to  be  independent  of  «(  />.  /« k.  i.e..  state 
dependent  measurement  noise  is  allowed),  then 

/>[  rik*  1  )|  v(  Ar+  1).  /*.u(A  )]=/>[  l)|.v(A  +  l)] 

(2.7) 

(the  control  is  anyway  irrelevant  in  the  conditioning). 

For  an  arbitrary  value  of  the  control  at  k  one  has 

/*[■'(  A  -  i )!  /* .  u(  k  )]  r  fp[x(k  *  I )  I  a  (  A  ).  /*.  u(  A  )] 

/>[.v(A  >|/\m(A  )]  dx{k  ).  (2.8) 

If  the  process  noise  sequence  is  white  and  independent 
of  the  measurement  noises  ( r(A  )  conditioned  on  v<  A  )  has 
to  be  independent  of  i (  /  -  )).n(  /).  /<A.  i.e..  state  depen¬ 
dent  process  noise  is  allowed),  then 

/'[  vl*  •  1)1  i|A  ).  r.  u(k  )]  />[  x  (  A  •  l)|  v(  A  ).  u(  k  )] 

(2.9) 

and.  since 

r[x{k  )|/‘.m(A  )]  •-  />[  v(A)|/‘]=>*  (2.10) 

then,  inserting  (2.9)  and  (2.10)  into  (2.8)  it  follows  that 

/’[ -vl  A  -  l  ):/*.«(  A  )]  <>[a  *  1.  >*.«(  A  )j.  (2.11) 

Now.  using  (2.7)  and  (2.1 1 )  in  (2.6)  one  has 

=  l.>*.  v(A  -  I).m(A  )].  (2.12) 

i.e  .  /'  is  summarized  bv  Equation  (2.12)  is  the  recur¬ 
sion  for  the  information  state. 

From  the  smoothing  property  of  expectations  it  also 
follows  that,  for  i  'k. 

/’ l  x  (  /  )  /‘ .  f  j  ']  f  „  '] 

//»['(  / );  X(  A  ). /*./.*•  'J  /»[  x(A  )[/*]  </x  (  A  ) 

I  p[  v(  / )'  v (  A  ).(./  ']>*  (/\  (  A  ) 

“[  /- c  t !]  (2.12) 

where  the  whiteness  of  the  process  noise  sequence  and  its 
independence  front  the  measurement  noises  has  been  used 

again 

Iheretorc.  the  whiteness  and  mutual  independence  of 
die  two  noise  sequences  is  a  sufficient  condition  for  to 
be  an  mlormation  state.  It  should  be  emphasized  that  the 
whiteness  is  the  crucial  assumption  This  is  equivalent  to 
the  requirement  that  \(A)  be  an  incompletelv  observed 
Markov  process  If.  for  example,  the  process  noise  se¬ 
quence  is  not  white  it  is  obvious  that  ^  does  not  sum¬ 
marize  the  past  data.  In  this  ease  the  vector  \  is  not  a  state 
mvniore  and  it  lias  to  be  augmented  (see.  e  g..  (3I|).  This 
discussion  points  out  the  reason  wliv  the  formulation  of 
stochastic  control  problems  is  done  with  white  noise  se¬ 
quences 


III.  From  the  Principle  of  Optimality  to 
Stochastic  Dynamic  Programming 

Consider  the  problem  where  the  number  N  of  time  steps 
is  finite  and  deterministic.  In  general,  the  terminal  time  can 
be  a  random  variable,  possibly  depending  on  the  state  or  a 
decision  variable.  The  present  discussion  is  limited  to  the 
fixed  terminal  time  problems.  See.  e.g..  [11],  [18]  for  discus¬ 
sions  on  the  free  end-time  problem.  Denote  the  (scalar) 
cost  function  of  the  problem  as 

c=ax\u%'  •).  (3.i) 

Since  this  is  a  random  variable,  the  minimization  (in  gen¬ 
eral.  extremization)  is  done  in  the  Bayesian  approach  on 
the  expected  cost 

J  =  E{C).  (3.2) 

We  assume  here  that  the  minimum  and.  therefore,  an 
optimal  solution  (policy)  exist.  Otherwise,  the  infimum  of 
(3.2)  is  to  be  obtained  and  then  only  an  c-optimal  policy 
exists  (see.  e  g..  [1 1.  p.  42)).  Other  approaches,  like  min  -  max 
and  worst  distribution,  are  also  used  sometimes  but  thev 
are  usually  more  difficult. 

In  order  for  (3.2)  to  be  a  well-defined  criterion,  the 
expectation  must  exist,  i.e..  all  the  variables  entering  into 
the  cost  must  be  either  deterministic  or  random  (with 
suitable  moment  conditions  that  guarantee  the  existence  of 
the  expected  cost).  No  "unknown  constants"  can  be  used 
in  formulating  stochastic  control  problems  with  the  Baye¬ 
sian  approach. 

If  there  are  unknown  system  parameters,  they  have  to  be 
modeled  as  random  variables  with  a  priori  pdf.  If  these 
parameters  are  time  invariant,  then  one  has  a  single  realiza¬ 
tion  from  the  prior  pdf.  i.e..  an  unknown  system  model 
generated  by  a  probabilistic  mechanism  before  the  start  of 
the  process.  In  this  case  the  minimization  of  the  expected 
cost  implies  that  we  want  to  find  the  optimal  policv 

1)  over  all  possible  initial  conditions  (as  specified  bv 
their  pdf); 

2)  over  all  possible  values  of  the  unknown  parameters 
(whose  realization  is  according  to  the  corresponding  pdf) 

the  ensemble  ol  systems  perceived  by  the  controller  in 
v  tew  of  us  uncertamtv ; 

3)  over  all  possible  disturbance  sequences. 

When  there  are  unknown  time-invariant  or  slowly  vary¬ 
ing  system  parameters  ihe  sutchastic  controller  can  then  be 
adaptive,  i.e  .  it  will  (hopefully)  “learn”  the  svstem  parame¬ 
ters  during  the  control  period 

Ihe  causality  condition  is  that  any  decision  function 
must  depend  only  on  the  information  set  available  at  the 
time  it  has  to  be  computed,  i.e.. 

ii  {  A  )  u(  A .  /'  )  A  0.1.  ■  .  V  I.  (3.3) 

Since  the  principle  of  optimality  states  that  every  end 
part  of  the  decision  process  must  be  optimal,  the  multi¬ 
stage  optimization  has  to  he  started  from  the  last  stage. 
The  last  decision.  u(  \  I ).  must  be  optimal  with  regard  to 
the  information  se(  available  when  it  has  to  be  computed, 
i.e..  it  will  be  obtained  from  the  functional  minimization 
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min  E{C\I"  ')  (3.4) 

u(.V  I) 

where  C  is  the  cost  for  the  entire  problem. 

The  next  to  the  last  decision.  u(S-2) 

1 )  must  be  optimal  with  respect  to  (w.r.t.)  Is  2  and 

2)  is  to  be  made  knowing  that  the  remaining  decision 
u(  .V  —  1 )  will  be  optimal  w.r.t.  lN  '  D Is  2. 

Thus,  the  (functional)  minimization  that  yields  the  deci¬ 
sion  function  at  S' -2  is 

min  e[  min  £(C|/V  ')|/v  21  (3  5) 

and  it  uses  the  result  of  the  functional  minimization  (3.4). 

Note  that  the  outside  averaging  in  (3.5)  is  over  y(.V-  1) 
using  the  conditional  density 

/»[.»'( .V-l)|/*--\u(A’-2)]  (3.6) 

parameterized  by  the  control  at  S'  -  2.  Since  this  measure¬ 
ment  is  not  yet  available  when  u(S-  2)  is  to  be  computed 
but  it  will  be  available  for  u(S—  1)  it  is  “averaged  out"  in 
(3.5). 

The  above-described  last  two  steps  are  entirely  similar  to 
the  “preposterior  analysis”  technique  from  the  operations 
research  literature  discussed,  e.g.,  in  (22).  This  technique  is 
usually  formulated  in  the  following  context.  The  first  deci¬ 
sion  [here  w(  .\'-2)]  is  for  information  gathering  by  an 
experiment  from  which  a  posterior  information  will  result 
[here  v(A  -  1)]  that  will  be  used  to  make  the  last  decision 
(here  u(S -  I)].  The  prior  (to  the  experiment)  probability 
density  of  the  (posterior)  result  of  the  experiment  is  called 
the  “preposterior  density"  and  in  the  present  problem  this 
is  ( 3.6).  Thus,  one  can  say  that  preposterior  analysis,  which 
is  "anticipation"  (in  a  statistical  sense,  i.e..  causal)  of  future 
information  is  a  consequence  of  the  principle  of  optimality. 

From  the  above  discussion  it  can  be  seen  that  the 
principle  of  optimality’s  statement  that,  at  every  stage,  “the 
remaining  decisions  must  constitute  an  optimal  policy  with 
regard  to  the  current  information  set"  implies  the  follow¬ 
ing:  every  decision  has  to  use  the  available  “hard"  informa¬ 
tion  (2.3)  and  "soft"  information  (3.6)  about  the  subse¬ 
quent  hard  information.  This  can  be  paraphrased  as  the 
optimal  controller  has  to  know  how  to  use  what  it  knows  as 
well  as  what  it  knows  about  what  it  shall  know. 

The  extension  of  (3.5)  to  the  full  .V-stage  process  yields 
the  optimal  expected  cost  starting  from  the  initial  time  as 


the  minimization  (3.7)  of  C(0),  the  cost  starting  from  the 
initial  time  0  yields  the  discrete-time  stochastic  dynamic 
programming  equation.  Dynamic  programming  can  be  ap¬ 
plied  only  to  the  so-called  class  of  “decomposable”  cost 
functions,  as  pointed  out  in  [21],  [23].  The  additive  cost 
(3.8)  belongs  to  this  class. 

Since 

C‘=  2  c[j.x(j). «(>)]  (3.9) 

)  -  (i 

is  independent  of  Ufi  [ 1  and  using  the  smoothing  property 
of  the  expectation  operator,  i.e.. 

£[£(  |/')|/A]=£[  |/‘]  Sj>k  (3.10) 

one  has  from  (3.7) 

J*(0. 1°) 

=  min£<  min  E  min  E  c(N) 
u(0)  u<  V  2)  i.|  .V  -  I ) 

+  2  c(y)|/v  1 
;  =  0 

=  min£|  -  min  £CV~2  +  min  E[c(Ar) 

u(0)  (  m  V -  2)  u(\-l)  l 

+  c(  N—  1)|/  V  '][/  V  2  -  -  -  |/° 

=  min£lr(0)  +  min£jc(l)-f  ■  -  -  min  E  c(S- 2) 
u|0)  [  u(  1  >  (  u<  .V  — 2) 

+  min  E[c( .V  -  1)  +  <•( N)\ts'  ']|/ v-: 

(3.11) 

In  the  above  the  cost  summands  have  been  moved  to  the 
left  outside  the  minimizations  that  are  not  relevant  for 
them. 

Rewriting  (3.1 1)  in  (backward)  recursive  form  yields  the 
Bellman  equation 

y*(A.  /‘)=  min  £{  of  A.  v(  A  ).u(A)l 

u(A>  1  1  1 


y*<o.  /") 


+  £*( A  +  I.  /**  ')!/*}  k  =  N—  1,-  •  -.0  (3.12) 


min  K  • 

■  ■  mm  t 

■[  min  £(C|/V  ')| Is  2  • 

•|/°) 

«{<>>  [ 

u<  v 

mV  i ) 

1 

(3.7) 


where  /"  is  the  initial  information.  Note  that  this  equation 
does  not  assume  any  particular  form  for  the  cost  function 
( 

For  the  additive  cost  given  by 

v  i 

C(A)=r(/V.jr(Af)]+  2  ‘[v-  My).  u(j )]  (3.8) 

j  * 


where  J*(k.  /*)  is  the  optimal  cost-to-go  from  time  A  to 
the  end  and  its  dependence  on  the  available  information 
set  at  A  is  explicitlv  pointed  out.  The  terminal  condition  for 
(3.12)  is 

y*(  .V.  /* )  =  £{<  [  S.  x{  S  )]|/v}  (3.13) 

where  the  last  measurement  is  irrelevant  since  it  is  averaged 
out  immediately. 

The  stochastic  dynamic  programming  functional  equa¬ 
tion  (3.12)  resulted  from  the  use  of  the  principle  of  opti¬ 
mality  embodied  in  (3.7)  for  the  additive  cost  (3.8).  The 
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recursion  was  obtained  by  moving  to  the  left  in  (3.1 1)  the 
cost  summands. 

An  equivalent  approach,  based  on  the  "basic  lemma  of 
stochastic  control"  [2]  is  as  follows.  This  basic  lemma  states 
that 

min  £[<  (  v.  u  )]  =  min  £{  £[c(  v.  u  )[»] } 

s*Emin£[<(  i . u )[c] .  (3.14) 

i.e..  if  a  measurement  y  related  to  v  is  available  then  the 
minimization  of  the  conditional  expectation  (the  right-hand 
side  (RHS)  of  (3.14)]  yields  the  absolute  minimum.  This  is 
equivalent  to  the  statement  that  to  minimize  an  integral 
(the  outside  expectation  in  (3.14)]  is  best  done  by  minimiz¬ 
ing  the  integrand  at  each  point  via  the  function  u(  y ).  i.e.. 
"feedback."  instead  of  a  single  value  for  the  entire  integral, 
i.e..  “open  loop."  In  other  words,  moving  a  minimization 
inside  a  sequence  of  expectations,  to  be  in  front  of  a 
conditional  expectation  (conditioned  on  all  the  available 
information)  is  what  is  needed  for  the  global  minimum. 
Thus,  based  upon  (3.14)  the  expected  cost  is  minimized  as 
follows: 

min  £{C|/(M 

«i<>!  ,»i\  :u.i\  h 

mm  £'{  •£[e(('|/v)|/v  '1  •  -  |/"} 

Mini,  .in  v  :>.«<  v  i>  '  t  j  ) 


and  allow  an  assessment  of  the  effect  of  uncertainties 
(imperfect  information:  present  and  future)  on  the  con¬ 
troller  and  its  performance. 

The  approximations  of  the  stochastic  dynamic  program¬ 
ming  fall  in  the  following  two  classes. 

1)  Feedback  Type  Algorithms:  In  this  case  the  control 
depends  only  on  the  current  information 

u(k  )=u(k.  /*)  (4.1) 

but  does  not  use  the  prior  statistical  description  of  the 
future  posterior  information 

p[y(j+  D|/'].  >>*•  (4.2) 

2)  Closed- Loop  Type  Algorithms:  Such  a  controller 
utilizes  feedback  (4.1)  and  anticipates  future  feedback  via 
(4.2).  i.e..  that  the  loop  will  stay  closed. 

Feldbaum  [14]  introduced  the  concept  of  dual  effect  in 
the  control  of  stochastic  dynamic  systems.  In  a  stochastic 
problem  the  control  has.  in  general,  two  effects. 

1)  It  affects  the  state  (control  action). 

2)  It  affects  the  uncertainty  of  the  state  (augmented  by 
the  possibly  unknown  parameters). 

A  rather  general  mathematical  definition  of  this  has  been 
given  in  (7]  in  terms  of  conditional  centra)  moments  of  the 
state  vector.  To  illustrate  it.  let  the  conditional  covariance 
of  the  state  at  k  be 


mm  £ 

<«<<) 


( 

r 


min  £  min  £(C|/V  ')|/N 

Ml  V  2 1  u(  V  1 1 


(3.15) 


i.e..  exactly  (3.7).  Note  that  the  nestedness  property  (2.3)  of 
the  sequence  lk  was  used  above. 


IV.  Dl' A!  Kmc I:  C’Al  MON  AND  PROBING 

The  solution  of  multistage  stochastic  decision  processes, 
either  in  the  general  form  (3.7)  or  in  the  stochastic  dynamic 
programming  form  (3.12)  for  an  additive  cost  is  a  formida¬ 
ble  problem.  Unless  an  explicit  form  is  found  for  the 
optimal  cost-to-go  in  (3.12)  one  cannot  solve  this  func¬ 
tional  equation  except  numerically.  The  curse  of  dimen¬ 
sionality  [6]  afflicted  upon  the  deterministic  dynamic  pro¬ 
gramming  is  further  compounded  by  the  expectation  oper¬ 
ators  in  the  stochastic  case  making  it  unsolvable  with  a  few 
exceptions  (in  addition  to  numerical  minimization,  numeri¬ 
cal  calculation  of  the  conditional  expectations  also  has  to 
be  carried  out.  which  is  practically  impossible). 

The  few  exceptions  are  the  linear-quadratic  problem  (I], 
[2],  [7],  the  linear-exponential-quadratic-Gaussian  problem 
[24]  and  a  linear  system  with  a  special  form  cost  (even 
powers  of  the  state  up  to  sixth)  ]25]. 

Since  one  cannot  obtain  the  optimal  stochastic  controller 
it  is  of  interest  to  find  suitable  approximations  for  the 
stochastic  dynamic  programming.  Such  an  approximation 
should  preserve  the  prepostcrior  analysis  property  of  the 
principle  of  optimality  mentioned  in  the  previous  section 


2(k\k)  =  E{[x(k)-x(k\k)][x(k)-x(k\k)]'\lk} 

(4.3) 

where  x(  k  \  k )  denotes  the  conditional  mean.  Then  if  2(  k  |  k ) 
does  not  depend  on  the  past  controls  Uk  ’.  the  control  has 
no  dual  effect  (of  second  order),  i.e..  it  is  neutral.  This  is 
the  case  in  linear  dynamic  systems  with  additive  but  not 
necessarily  Gaussian  noise  [7],  [32).  In  nonlinear  systems 
the  state  estimation  accuracy  is  in  general  control  depen¬ 
dent—  the  control  has  a  dual  effect. 

If  the  system  has  unknown  parameters,  modeled  as  a 
realization  of  a  vector  valued  random  variable,  the  control 
values  will  affect,  in  general,  the  information  about  them 
derived  from  the  measurements.  Since  having  more  accu¬ 
rate  estimates  of  the  system  parameters  is  intuitively  be¬ 
neficial  for  the  controller,  the  idea  that  the  controller 
should  enhance  their  identification  is  appealing.  The  initial 
control  should  account  for  the  fact  that  it  is  applied  to  a 
system  with  parameters  drawn  from  the  prior  distribution 
and  for  the  fact  that  their  value  can  be  further  identified 
during  the  process.  This  is  the  adaptive  or  learning  feature 
of  the  controller.  A  simple  example  that  illustrates  the  dual 
effect  of  the  control  is  given  in  the  Appendix. 

Therefore,  the  controller  can  be  used  for  "active  infor¬ 
mation  storage"  (estimation  enhancement  or  uncertainty 
reduction)  via  what  has  been  called  probing  [14).  Note  that 
only  a  "closed-loop"  algorithm  can  do  this  active  informa¬ 
tion  gathering.  On  the  other  hand,  the  existence  of  uncer¬ 
tainty  in  the  system,  might  have  another  effect.  Since,  in 
general,  uncertainty  in  the  system  will  increase  the  ex¬ 
pected  cost,  the  controller  should  be  “cautious"  not  to 
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increase  further  the  effect  of  the  existing  uncertainties  on 
the  cost.  A  simple  example  to  illustrate  this  "caution" 
effect  is  also  given  in  the  Appendix. 

The  open-loop  feedback  (OLF)  control  [I],  which  be¬ 
longs  to  the  feedback  class,  works  well  in  some  problems. 
Nevertheless,  it  can  suffer  from  the  “turn-off'  phenome¬ 
non  which  can  be  avoided  only  by  a  closed-loop  controller 
[15].  [36].  As  pointed  out  in  [7]  the  optimal  solution  of  the 
linear-quadratic  control  problem  belongs  to  the  feedback 
class  because  in  this  problem  the  control  has  no  dual  effect. 
Among  the  algorithms  that  belong  to  the  feedback  class  are 
the  heuristic  certainty  equivalence  (“enforced  separation") 
[10].  [2K],  the  self-tuning  regulator  [3].  the  cautious  control 
[36],  and  the  multiple  model  partitioned  control  [4],  [13], 
Algorithms  of  the  closed-loop  type  are  the  wide-sense 
adaptive  [8],  [29].  [30].  the  dual  controllers  of  [27],  [36],  the 
innovations  dual  controller  of  [20],  and  the  model  adaptive 
dual  controller  for  multiple  models  [37], 

V  Cal  nos  and  Probing  Ffhc  is  from  ihl 
Stochastic  Dynamic  Programming 

The  previous  discussion  pointed  out  qualitatively  that  a 
controller 

1 )  has  a  direct  control  effect  on  the  state; 

2)  can  perform  active  information  gathering  (probing) 
to  improve  the  accuracy  of  subsequent  control  actions;  and 

3)  has  to  be  cautious  because  of  the  existing  uncer¬ 
tainties  in  the  sxstem 

\V  hile  there  is  no  universal  agreement  on  the  notions  of 
caution  and  probing  this  author  believes  these  concepts  are 
valuable  in  the  derivation  of  suboptimal  algorithms.  In  this 
section  a  quantification  of  the  above  properties  is  pre¬ 
sented.  This  is  obtained  b\  an  approximation  of  the  opti¬ 
mal  cost  from  the  stochastic  dynamic  programming  that 
results  in  a  decomposition  of  the  cost  into  three  terms,  each 
associated  with  one  of  the  above  items. 

The  stochastic  dynamic  programming  equation  (3.12)  is 
approximated  as  follows  [8],  [2*9],  [30].  First,  instead  of  the 
exact  information  state,  the  following  approximate  "wide- 
scnsc"  information  stale  is  used: 

'•>'*  {.v(A|A  ).£(AiA)).  (5.1) 

i.e..  the  (approximate)  conditional  mean  and  covariance  of 
i(A  )  obtained,  e  g.,  via  an  extended  Kalman  filter.  The  use 
of  this  “quasi-sufficient  statistic"  is  needed  for  an  algo¬ 
rithm  that  is  implementable.  Assume  now  that  the  system 
is  at  time  A  and  a  closed-loop  control  (in  the  sense  defined 
earlier)  is  to  be  computed  using  'f*  and  the  present  knowl¬ 
edge  (statistical)  about  the  future  observations. 

I  he  principle  of  optimality  w  ith  the  information  state 
(5  1)  yields  the  following  stochastic  dynamic  programming 
equation  for  the  closed-loop-optimal  expected  cost-to-go  at 
time  A 

./*(  A  .  ‘  |  -  min  £'{c[A.  v( k  ).  «( A  )| 

u(  A  I  1  1  1 

+  ./*(A  *  1. '  l)\'?k).  (5.2) 


The  main  problem  is  to  obtain  an  approximate  expres¬ 
sion  for  E{J*(k*  l.*.r“  l)|“i'*j  preserving  its  closed-loop 
feature,  i.e..  this  expression  should  incorporate  the  “value" 
of  the  future  observations.  In  order  to  find  an  explicit 
solution,  the  cost-to-go  C(Afl)  defined  in  (3.8)  is  ex¬ 
panded  about  a  nominal  trajectory  (designated  by  sub¬ 
script  0)  generated  by  the  recursion 

*o(  J  1 )  =A  J •  Ao(  J ) •  “o(  j ) •  v  ( j )] , 

j  =  k+\.---.N-l  (5.3) 

where  u0{j ).  j- k  +  1.-  •  -,N-  I  is  a  sequence  of  nominal 
controls  and  t(  j )  is  the  mean  of  the  process  noise.  The 
initial  condition  .v0(  A  +  1)  is  taken  as  the  predicted  value  of 
the  state  at  A  +  1  given  and  the  control  (yet  to  be  found) 
u(k ).  The  expansion  of  the  cost-to-go  from  time  A  +  1  is 

C(  A  +  1 )  =  C0(  A  +  1)  +  AC0(  A  + 1 )  (5.4) 

where  C’0( A  +  1)  is  the  cost  along  the  nominal  (ignoring  all 
the  uncertainties)  and  A(j,(  A  +  1 )  is  the  variation  of  the  cost 
about  the  nominal  with  terms  up  to  second  order  obtained 
from  a  Taylor  expansion,  which  will  capture  the  stochastic 
effects.  The  approximation  of  the  closed-loop-optimal  ex¬ 
pected  cost-to-go  from  time  A  +  1  is  done  now  as  follows: 

J*(  A  +  1 )  =  Cu(  A  +  1 )  +  A70*(  A  +  1 )  (5.5) 

where  the  optimal  “closed-loop"  perturbation  cost  is 
AW-H) 

min  £•(  -  •■  min  £'[ AC„(A+  l)|‘:i'v'  ‘1  •  •  ') 

Hulk  •  I)  l  Sul  V  h  1  1  > 

(5.6) 

and  6m(  A  )  =  i/(  A  )-«„(  A ).  This  minimization  problem  is 
quadratic  since,  by  construction.  AC0(A  +  1)  is  quadratic  in 
8  id./),  A  +  1  =£/*£ /V- 1  as  well  as  in  the  variations  about 
the  nominal  trajectory.  A.v(  j ) -=  ,v(  / ) -  jr0(  j ).  A+  1 
Using  a  Taylor  series  expansion  of  (2.1)  and  including 
second-order  terms  results  in  a  set  of  perturbation  state 
equations  in  5,v(  / )  with  5.x ( A  +  I )  =  x( A  +  I )  -  xn(  A  +  1 )  as 
an  initial  condition.  Thus,  the  problem  posed  in  (5.6) 
consists  of  minimizing  a  quadratic  cost  given  a  quadratic 
system  of  state  equations,  and  is  somewhat  similar  to  the 
linear-quadratic  control  problem.  Then,  by  assuming  a 
solution  quadratic  in  the  perturbed  state  (i.e..  neglecting 
higher  order  terms)  and  evaluating  the  expectations  per¬ 
mits  the  optimal  closed-loop  (CL)  cost-to-go  to  be  obtained 
explicitly.  See  [8]  for  the  development  of  the  details.  This 
result,  obviously,  depends  on  the  approximations  used  in 
the  derivation. 

The  Cost  Decomposition 

The  explicit  expression  of  the  (approximate)  cost  ob¬ 
tained  can  be  decomposed  as  follows: 

7n  ( A  )  =  J„(  k)+J(  (k)+Jr(k)  (5.7) 

T  ( 
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where  the  subscript  D  stands  for  deterministic.  C  stands 
for  caution,  and  P  stands  for  probing  components. 

It  will  be  assumed,  for  simplicity,  that 

t  [A .  *(  A  ).  u(  A  )]  =c,[ A.  v(  A  )]  +  <■,[  A .  u(  A  )]  <5.8 ) 

and  that  the  process  noise,  whose  covariance  is  l’.  enters 
additivelv  in  (2.1).  Then  the  deterministic  component  of 
the  cost-to-go  is.  excluding  c,  (which  does  not  depend  on 
the  control)  is  given  by 

JD{k)  ^cz[k.u(k)]  +  C0(k+\)  +  y„(k+\)  (5.9) 

and  the  stochastic  terms  obtained  via  the  perturbation 
problem  are 

Jc(k)  =  1  2tr[A:0(*+l)2(*^l|*)] 

V  1 

+  1/2  2  trfAyy-UTO)]  (5.10) 
rk  + 1 

v  i 

Jp(k )  =  1/2  2  tr[cf0.,»(y)S„(y|y)].  (5.11) 

j  •  i 

2:  is  the  covariance  of  the  augmented  state  and  y.  A.  and  o' 
are  given  by  appropriate  recursions  detailed  in  [8], 

The  stochastic  term  (5.10)  reflects  the  effect  of  the 
uncertainty  at  time  A  summarized  by  l'(A|A)  and  subse¬ 
quent  process  noises  on  the  cost.  These  uncertainties  can¬ 
not  be  affected  by  u(k  )  but  their  weightings  do  depend  on 
it.  e  g..  2£<  k  1  j k  )  depends  on  2(  k  | k )  and  w( A  ).  The  effect 
of  these  uncontrollable  uncertainties  on  the  cost  should  be 
minimized  by  the  control;  this  term  indicates  the  need  for 
the  control  to  be  cautious  and  thus  is  called  caution  term. 
The  stochastic  term  (5.11)  accounts  for  the  effect  of  un¬ 
certainties  when  subsequent  decisions  (corrective  actions) 
will  he  made.  The  weighting  of  these  future  uncertainties  is 
nonnegative  (a,,  ((  is  positive  semidefinite).  If  the  control 
can  reduce  bv  probing  (experimentation)  the  future  up¬ 
dated  covariance,  it  can  thus  reduce  the  cost.  The  weight¬ 
ing  matrix  n  vields  approximately  the  value  of  future 
information  for  the  problem  under  consideration.  There¬ 
fore.  thi'  is  called  the  pruhintf  term.  Note  that  even  if  the 
control  has  no  dual  effect,  i.e..  it  does  not  affect  the  future 
covariance  2:  of  the  augmented  state  (which  includes  the 
random  parameters),  the  weighting  of  these  covariances 
might  still  be  affected  by  the  control.  Therefore,  this  (ad- 
mittedlv  approximate)  procedure  accounts  not  only  for  the 
dual  effect  hut  all  the  stochastic  effects  in  the  performance 
index. 

Thus,  starting  from  the  stochastic  dynamic  programming 
one  can  see  the  following:  the  benefit  of  probing  is  weighted 
b\  its  cost  and  a  compromise  is  chosen  such  as  to  minimize 
the  sum  of  the  deterministic,  caution,  and  probing  terms, 
[he  minimization  of  /<l  will  also  achieve  a  tradeoff  be¬ 
tween  the  present  and  future  actions  according  to  the 
information  available  at  the  time  the  corresponding  deci¬ 
sions  are  made. 


The  closed-loop  control  u(  A )  is  found  from  the  minimi¬ 
zation  of  (5.7)  using  a  search  procedure.  At  every  A  to  each 
control  u(  A )  for  which  (5.7)  is  evaluated  during  the  search 
there  corresponds  a  predicted  state  and  to  this  predicted 
state  a  sequence  of  deterministic  controls  is  attached  that 
defines  the  nominal  trajectory.  The  only  use  of  the  nomi- 
nals  and  perturbations  is  to  make  possible  the  evaluation 
of  the  cost-to-go  optimized  in  a  closed-loop  manner.  This 
procedure  is  repeated  at  every  time  a  new  control  is  to  be 
obtained. 

The  “quality"  of  the  approximations  used  in  the  deriva¬ 
tions  outlined  above,  in  particular,  the  second-order  expan¬ 
sions.  is  an  open  question.  Only  extensive  Monte  Carlo 
simulations  with  rigorous  comparison  with  other  algo¬ 
rithms  (see.  e.g.,  [37])  can  answer  these  questions.  For  some 
problems  [29],  [30]  significant  performance  improvements 
have  been  found.  In  other  cases  where  probing  is  not 
significant  the  CL  algorithm  performed  close  to  the  OLF 
[8], 

The  cost  decomposition  is  believed  to  provide  the  only 
insight  we  now  have  towards  the  understanding  of  complex 
stochastic  control  problems  for  which  the  optimal  solution 
is  unknown.  Furthermore,  the  classification  of  various  sto¬ 
chastic  control  problems  presented  in  the  next  section, 
which  is  based  on  this  decomposition,  can  be  used  as  a  tool 
to  assess  for  which  nonlinear  problems  stochastic  control 
algorithms  can  provide  significant  performance  improve¬ 
ments. 


VI.  Implications  of  thf  Cost  Decomposition 
and  Examples 

The  decomposition  of  JCL  presented  above  yields  an 
explicit  evaluation  of  the  tradeoffs  between  direct  control, 
active  probing,  and  a  cautious  action  on  the  part  of  the 
controller.  Thus,  the  ability  of  the  control  to  affect  learning 
as  well  as  steer  the  system  to  its  targets  can  be  numerically 
evaluated  using  this  decomposition.  This  is  a  particularly 
attractive  feature  for  it  captures  both  the  need  (and  desire) 
of  the  controller  to  extract  more  information  from  the 
system  as  well  as  the  aversion  for  drastic  actions  which 
may  result  in  undesirable  outcomes  (risk  aversion  [12]). 
Furthermore,  this  also  gives  indication  whether  the  uncer¬ 
tainty  dominates  the  problem  when  the  stochastic  part  of 
the  cost  ( JC+JP )  exceeds  significantly  the  deterministic 
part  (Jp). 

If  the  uncertainty  dominates  the  problem,  then  one  can 
distinguish  two  cases. 

1)  The  caution  component  Jc  dominates.  Then,  since 
this  is  “uncontrollable"  uncertainty,  one  has  a  highly  un¬ 
certain  model  which  cannot  be  improved  in  the  course  of 
the  control  period. 

2)  The  probing  component  JP  dominates.  Then,  with  the 
dual  effect  of  the  control,  one  can  reduce  the  uncertainty  of 
the  model— thus,  the  model,  while  uncertain  at  the  begin¬ 
ning.  might  prove  to  be  ultimately  adequate  for  the  control 
problem  under  consideration. 
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A  third  case  occurs  when  we  have  the  following. 

3)  The  deterministic  component  of  the  cost  Jn 
dominates,  then  the  parameter  uncertainties  are  of  no 
significant  consequence. 

The  last  case  is  the  most  desirable  because  then  the 
controller  can  be  of  the  certainty  equivalence  type  [7].  i.e.. 
it  cat.  ignore  the  uncertainties  by  replacing  all  the  random 
variables  bv  their  (conditional)  means.  This  is  the  least 
expensive  algorithm  because  it  is  essentially  deterministic 
and  will  yield  near  optimum  performance.  However,  the 
stochastic  control  approach  outlined  above  has  to  be  used 
to  reach  this  conclusion. 

Wonham  [33]  stated,  about  ten  years  ago,  the  following. 
In  the  case  of  (stochastic)  feedback  controls  the  general 
conclusion  is  that  only  marginal  improvement  can  be  ob¬ 
tained  (over  a  controller  ignoring  the  stochastic  features), 
unless  the  disturbance  level  is  very  high;  in  this  case  the 
fractional  improvement  may  be  large  but  the  system  is 
useless  anyway. 

This  statement  implies  that  with  high-level  disturbances 
(in  which  one  can  include  large  parameter  uncertainties) 
one  has  a  "hopeless”  situation.  The  other  extreme  is  the 
situation  with  low  level  disturbances.  These  two  situations 
seem  to  match,  respectively,  cases  1)  and  3)  from  above.  It 
was  also  pointed  out  in  [33]  that  Feldbaum’s  dual  control 
which  probes  the  system  might  hold  the  promise  of  useful 
applications  of  stochastic  control.  However,  at  that  time  it 
was  not  clear  whether  there  are  sdch  problems  and.  if  yes. 
then  how  to  obtain  a  (dual)  controller  that  can  effectively 
probe  the  system  to  reduce  uncertainties.  The  wide-sense 
dual  (or  stochastic  closed-loop)  control  algorithm  [8],  [29], 
presented  in  Section  V,  can  then  be  used  to  obtain  signifi¬ 
cant  performance  improvement. 

As  will  be  shown  in  the  sequel,  the  cost  decomposition 
presented  above  can  answer  affirmatively  the  question 
whether  there  are  probing-dominated  stochastic  control 
problems,  i.e..  problems  falling  in  case  2)  from  above. 

In  the  following  a  number  of  examples  are  discussed  to 
illustrate  the  usefulness  of  the  cost  decomposition  and  its 
implications.  Some  of  these  examples  have  appeared  earlier 
in  the  literature  and  they  are  reexamined  in  light  of  the 
recently  gained  quantitative  understanding  of  the  caution 
and  probing  effects  from  the  cost  decomposition. 

.1  /)  Probing- Dominated  Problem  (Terminal  Guidance) 

The  first  example  is  the  interception  problem  from  [30], 
In  this  case  a  third-order  linear  system  with  six  unknown 
(random)  parameters  and  both  process  and  measurement 
noises  was  considered.  The  augmented  nine-dimensional 
>tate  (for  which  the  dynamic  equation  is  obviously  nonlin¬ 
ear)  had  an  initial  estimate  and  an  associated  covariance. 
The  elements  of  this  covariance  matrix  corresponding  to 
the  parameters  reflected  the  fact  the  initial  estimates  of  the 
parameters  were  poor.  The  goal  was  to  steer  one  of  the 
(proper)  state  components  to  a  target  value  by  the  terminal 
time,  which  was  ,V  =  20.  This  was  expressed  by  a  quadratic 
term  for  the  terminal  state.  There  was  no  cost  associated 
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Fig  1  Com  decomposition  for  a  probing-dominalcd  stochastic  control 

problem  (terminal  guidance  for  a  third-order  system  with  sot  unknown 

parameters). 

with  the  state  prior  to  the  terminal  time  and  the  cost 
weighting  of  the  control,  also  entering  quadratically.  was 
low. 

Fig.  1  presents  the  plot  of  the  cost  decomposition  for  the 
first  period  control.  It  can  be  seen  that  this  is  a  probing- 
dominated  stochastic  control  problem:  the  probing  compo¬ 
nent  of  the  cost  is  approximately  80  percent  of  the  total 

cost. 

The  performance  of  the  wide-sense  dual  [or  closed-loop 
(CL)]  control  described  in  Section  V  was  compared  in  [30] 
via  Monte  Carlo  runs  to  the  HCE  (heuristic  certainty 
equivalence)  where  the  parameters'  estimates  were  used  as 
if  they  were  the  true  values.  The  observed  improvement  of 
the  CL  algorithm  versus  HCE  was.  from  (the  modest 
number  of)  20  Monte  Carlo  runs,  around  85  percent  [30], 
This  fractional  improvement  is  quite  close  to  the  share  of 
the  probing  cost  from  the  total  as  indicated  above.  The  CL 
controller,  via  its  dual  effect  helped  identify  the  system, 
i.e..  it  was  actively  adaptive  and  this  was  the  key  factor  in 
its  better  performance.  This  decomposition,  which  was  not 
known  at  the  time  of  the  original  work  [30],  can  now  be 
used  to  provide  the  explanation  for  the  observed  perfor¬ 
mance  improvement. 

An  important  observation  is  that  the  probing  component 
of  the  cost  is  not  convex  — the  parameter  identification  is 
enhanced  by  large  magnitude  first  period  control  values, 
both  negative  and  positive.  This  lack  of  convexity  of  the 
probing  component  leads  to  local  minima,  as  can  be  seem 
from  Fig.  1.  This  phenomenon  was  pointed  out  in  [27], 
[36].  The  behavior  of  the  multiple  minima  is  discussed  later 
in  more  detail. 
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The  example  discussed  above,  which  is  of  the  terminal 
state  penalty  type,  belongs  to  the  second  class  of  problems, 
i.e..  probing  dominated. 

B  A  Caution-Dominated  and  an  Essentially  Deterministic 
Problem  I Econometric  Models) 

Two  additional  problems,  derived  from  econometrics  are 
discussed  next.  Both  are  macroeconometric  models  of  the 
IS.  derived  from  the  same  data  but  under  different 
assumptions.  For  a  concise  description  of  the  models  see 
[4],  [  10).  The  first  econometric  model  has  three  states  (gross 
national  product,  investment,  and  consumption),  is  driven 
by  the  government  expenditures  input,  and  has  five  un¬ 
known  parameters  characterized  by  an  initial  estimate  and 
covariance  matrix.  The  second  econometric  model  has  1 1 
states  (as  above  plus  increments  of  these  variables  and 
some  lagged  values),  same  input,  and  three  unknown 
parameters. 

The  first  model  was  obtained  by  Kendrick  using  ordinary 
least  squares  (17|  while  the  second,  more  elaborate  model, 
was  obtained  by  Wall  using  the  full  information  maximum 
likelihood  method  (34),  (35).  The  cost  was  quadratic  in  the 
deviations  of  the  three  economic  variables  and  the  input 
from  target  values  along  the  entire  trajectory  consisting  of 
seven  periods  (economic  quarters). 

The  analysis  of  the  cost  J ‘ '(())  for  the  first  econometric 
model,  shown  in  Fig.  2.  points  to  the  fact  that  this  problem 
is  dominated  bv  the  caution  term.  This  is  due  to  the 
relatively  large  uncertainties  in  the  initial  parameter  esti¬ 
mates.  The  probing  component  is  negligible  —  this  problem 
is  completely  dominated  by  the  initial  uncertainly  -  it  be¬ 
longs  to  the  first  class  defined  at  the  beginning  of  the 
section.  Note  that  both  the  caution  as  well  as  the  probing 
term  tend  to  reduce  the  value  of  u' 1  versus  uM<  E.  i.e.,  they 
are  not  conflicting  in  this  case. 

Fig.  3  shows  the  cost  for  the  second  econometric  model. 
1  he  deterministic  component  dominates  here  and  i/  1  (0)  is 
very  close  to  uH<  h(0).  The  probing  component  is  again 
negligible.  This  problem  belongs  to  the  third  class  - it  is 
essentially  deterministic. 

(  .1  Si  alar  Problem:  Parametric  Study  of  the  Cost  Shape 

Another  example  of  the  application  of  the  cost  de¬ 
composition  deals  with  a  scalar  linear  system  over  ,\'  =  2 
time  periods  discussed  in  [  19). 

v  (  A  •  1)  uv(  k  )  *  bu{  k  )  +  t (A  )  k  0.1  (6.1) 

with  a  0.7  known,  the  unknown  mput  gain  b  with  initial 
estimate  6(0)  0.6,  and  variance  o^;(0).  The  process  noise 
t  (  k )  is  zero  mean,  white  with  variance  F.  The  goal  is  to 
keep  the  state  v,  which  is  perfectly  observed,  around  zero. 
I  his  is  expressed  by  the  quadratic  cost 

C  I  2(2(2).v:(2)  +  l/2r[n‘(0)  +V(I)]  (6.2) 

with  terminal  state  weighting  ^(2)  and  control  weighting 
r  -0.1  The  initial  state  is  .x (0)  1 . 
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Fig  2.  Cost  decomposition  for  a  caution-dominated  stochastic  control 
problem  (third-order  econometric  model  with  five  unknown  parame¬ 
ters) 
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Fig  3  Cost  decomposition  for  an  essentially  deterministic  stochastic 
control  problem  tilth-order  econometric  model  with  three  unknown 
parameters! 

Fig.  4  presents  the  cost  decomposition  at  A  =  0  (first 
period)  for  the  initial  gain  uncertainty  <t;(0)  =  0.52  and 
process  noise  variance  (  =0  2  and  terminal  state  weighting 
(2(2)=  10.  The  probing  component  of  the  cost,  which  vanes 
drastically  with  the  control,  yields  two  minima  for  the  total 
cost.  It  is  of  interest  to  see  how  these  minima  behave  as  the 
terminal  state  weighting  changes.  This  is  illustrated  in  Fig. 
5.  For  even  larger  terminal  weighting  the  two  minima  get 
further  apart  while  for  a  lower  weighting.  Q( 2)=  1.  there  is 
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Fig  4  Cost  decomposition  for  a  two-stage  problem  with  unknown 
input  gain  (scalar  system) 
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f  li  5  The  closed-loop  cost  for  sarious  terminal  state  weightings  (scalar 

system) 

only  one  minimum  left.  In  this  latter  case  the  lighter 
terminal  penalty  does  not  justify  a  major  control  effort  to 
identify  accurately  the  parameter  b  and  t/ 1  is  quite  close 
to  u,K * . 

Another  aspect  of  interest  is  how  the  anticipated  future 
learning  changes  the  present  behavior  of  the  CL  controller. 
To  this  purpose  the  variance  of  the  process  noise  was 
y aricd.  Fig.  6  shows  the  cost  1  (0)  for  (>(2)  -  KXX).  o„:  -  2, 
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Fig  6  Effect  of  the  anticipated  future  learning  on  the  control  (scalar 

system) 

and  various  values  of  V.  For  large  process  noise  variance, 
less  learning  is  anticipated  and  the  cost  curve  is  relatively 
flat,  even  though  it  has  two  minima,  wide  apart.  For  low 
process  noise  variance  the  cost  curve  has  a  very  high 
maximum  at  u(0)  =  0  (when  no  learning  of  b  occurs)  and 
then  two  sharp  minima  around  this  point. 


VII.  Conclusions 

While  still  very  few  stochastic  control  problems  have 
been  solved  optimally,  insight  into  such  problems  can  be 
gained  by  using  the  decomposition  of  the  expected  cost. 
This  decomposition,  based  on  the  stochastic  dynamic  pro¬ 
gramming.  yields  three  cost  components;  one  deterministic 
and  two  stochastic  ones.  The  stochastic  terms  quantify  the 
effect  of  the  various  uncertainties  on  the  performance 
index.  The  effects  these  stochastic  terms  have  been  associ¬ 
ated  with  Feldbaum's  concepts  of  caution  and  probing. 
Furthermore,  this  decomposition  revealed  three  classes  of 
stochastic  control  problems;  caution  dominated,  probing 
dominated,  and  essentially  deterministic.  This,  admittedly 
fuzzy,  classification  pointed  out  that  there  are  stochastic 
control  problems  where  significant  improvements  can  be 
expected  when  using  an  appropriate  sophisticated  control 
algorithm.  The  examples  show  that  one  can  assess,  before 
extensive  simulations,  whether  significant  performance  im¬ 
provement  can  be  expected  in  a  stochastic  control  problem. 
It  has  also  been  shown  that  the  various  cost  components 
can  vary  drastically  with  changes  in  the  performance  index 
weightings.  The  probing  component  of  the  cost  can  be 
nonconvex  thus  leading  to  local  minima  in  the  total  cost. 
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Simpii  h\AMPi.hs  of  Probing  and  Cal  nos 

Consider  the  scalar  system 

v (  k  +  1)  -«.v(A  )  •  hu(k  )  -  i  (k  )  ( A.  1 ) 

with  <i  known,  b  an  unknown  parameter  with  prior  mean 
/>(())  and  variance  a^'(0).  and  r(  k  )  a  zero-mean  white  noise 
sequence  with  variance  n,;.  Letting 

'}•  (A. 2) 

i.e..  perfect  state  observations,  it  follows  that  the  un¬ 
certainty  about  parameter  b  at  time  k  -  1  is.  from  a  stan¬ 
dard  least-squares  argument,  dependent  on  the  control  at  A 
as  follow  s: 

,  ar{  k  )o : 

o,;(  A  •  1 )  ~-  --  - —  .  (A. 3) 

o;(  A  )ir  (  A  )  *  o,- 

This  clearly  illustrates  the  control’s  dual  effect,  in  addition 
to  its  effect  on  the  state  the  control  also  affects  the  future 
information  accuracy. 

C  onsider  next  the  same  system  with  the  (one-step  hori¬ 
zon  or  myopic)  cost 

((A)  x :(  A  •  1)  •  ,W(A  ).  ( A  .4.) 

I  he  control  that  minimizes 

71  A  l  /;■!{(  A  )  /‘  '  (A. 5) 


can  be  obtained  ea  :!>  as 


u*l  A  l 


a\  (  A  )  b ( A  I 
b'l  A  )  •  o,;(  A  )  ->  A 


Note  that,  because  of  the  mvopicitx  of  the  exist  (A. 4),  this 
controller  ignores  anv  possibilnv  of  learning.  On  the  other 
hand,  because  of  the  uncertainty  in  b.  this  control  can  be 
very  cautious  a  large  variance  o,;(  A  l  can  decrease  signifi¬ 
cantly  the  value  of  the  control  m  (AM  compared  to  the 
case  where  there  o  no  uncertainty  in  b  or  when  this 
uncertainty  is  ignored  as  an  HC  I.  controller  would  do 

...  .  civ  (  A  I h{  A  ) 

a'"  1  (  A  )  (A. 7) 

b'[  k  )  •  ,\ 

I  he  optimal  mvi>pic  controller  (A. A)  can  then  exhibit  the 
turn-off  phenomenon  [IN],  |3(i|  it  can  be  small  because  of 
large  uiKertaintv  in  b  and  (his  will  then  prevent,  according 
to  I  \  3i.  the  reduction  of  this  uncertainty. 
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Abstract.  The  topic  of  this  paper  is  the  application  of  some  recent  results 
in  stochastic  control  to  an  aerospace  problem  where  there  are  large 
uncertainties  in  the  dynamics  of  the  plant  to  be  controlled.  An  approxima¬ 
tion  to  the  stochastic  Dynamic  Programming  is  considered  that  results  in  an 
adaptive  control  of  the  "closed-loop"  type:  it  utilizes  feedback  (latest 
state  and  parameter  estimates  and  their  uncertainties)  as  well  as  their 
anticipated  future  uncertainties  -  it  anticipates  (subject  to  causality) 
subsequent  feedback.  This  algorithm  has  the  feature  that  allows  the  control 
to  enhance  the  parameter  Identification  in  real  time.  This  is  done  using 
the  control's  dual  effect:  the  control  can  affect  the  state  as  well  as  the 
(augmented)  state  uncertainty  and  thus  can  reduce  the  uncertainty  about  some 
parameters.  A  flight  control  application  in  which  stochastic  adaptive 
control  appears  to  offer  significant  payoff  is  the  active  control  of  air¬ 
craft  wing-store  flutter.  Improved  flutter  suppression  can  be  accomplished 
with  an  adaptive  controller  that  has  the  capability  to  learn  and  identify 


the  flutter  modes  during  the  flight. 

1.  INTRODUCTION 

The  topic  of  this  paper  is  the  application  of 
some  recent  results  in  stochastic  control  to  an 
aerospace  problem  where  there  are  large  uncer¬ 
tainties  in  the  dynamics  of  the  plant  to  be 
controlled.  While  the  stochastic  Dynamic 
Programming  [B1.B2]  yields,  in  principle,  the  j 
solution  to  general  stochastic  control  problems,] 
the  curse  o  f  dimensionality  prevents  its  appli-1 
cation  to  nonlinear  problems.  An  important 
class  of  problems  is  the  one  of  linear  systems 
with  unknown  and  possibly  time  varying  para¬ 
meters.  Such  a  system  is  nonlinear  in  the 
augmented  state,  which  is  made  up  of  the 
proper  state  and  the  unknown  parameters. 

It  was  pointed  out  in  (B3)  that  the  optimal 
stochastic  control  depends,  in  general,  on 

(c)  the  current  information  (e.g.,  the 
latest  estimate  of  the  state  and 
parameters) 

(<t)  the  quality  of  the  current  informa¬ 
tion  (represented,  e.g.,  by  the 
covariance  associated  with  the 
above  mentioned  estimates) 

(•etc)  the  anticipated  quality  of  the  sub¬ 
sequent  (future)  information 

The  well-known  optimal  solution  of  the  Linear 
'Juadratlc  Caussian  Problem  (without  unknown 
parameters)  lias  the  so-called  Certainty 
Equivalence  property:  the  resulting  feedback 
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control  has  the  same  gain  as  the  corres¬ 
ponding  deterministic  problem  and  only  uses 
the  state  estimate  instead  of  the  (unavail¬ 
able)  state.  This  solution  exhibits  only 
feature  (-t)  from  above  -  it  is  independent 
of  the  quality  of  the  state  estimate.  The 
"Heuristic  Certainty  Equivalence"  (HCE) 
algorithm  for  linear  systems  with  unknown 
parameters  consists  of  the  following:  the 
parameters  are  estimated  in  real  time  and 
the  feedback  gain  is  computed  using  the 
latest  parameter  estimates  as  if  they  were 
the  true  values  [SI].* 

This  algorithm,  while  adaptive,  does  not 
take  into  consideration  the  quality  of  the 
parameter  estimates. 

An  approximation  to  the  stochastic  Dynamic 
Programming  was  presented  in  |T1,T2,B4). 

In  the  terminology  of  (B3),  the  resulting 
adaptive  control  is  of  the  "closed-loop"(CL) 
type:  it  utilizes  feedback  (latest  state 

and  parameter  estimates  and  their  uncer¬ 
tainties)  as  well  as  their  anticipated 
future  uncertainties  -  it  anticipates  (sub¬ 
ject  to  causality)  subsequent  feedback. 

This  algorithm  has  all  three  features  (O- 
(caa)  mentioned  above.  In  particular,  the 
third  feature  allows  the  control  to  enhance 
the  parameter  identification  in  real  time. 
This  is  done  using  the  control's  dual 
effect  (Fill  the  control  can  affect  the 

*The  strict  meaning  of  Certainty  Equivalence 
is  that  all  the  random  variables  in  the 
problem  under  consideration  can  be  replaced 
by  their  means-the  problem  is  equivalent  to 
one  with  perfect  certainty. 


state  as  well  as  the  (augmented)  state 
uncertainty  and  thus  can  reduce  the  uncer¬ 
tainty  about  some  parameters.  This  is  the 
"probing"  or  "estimation/identification 
enhancement"  property  of  the  control.  For 
this  reason  the  algorithm  was  also  called 
"dual  control.”  At  the  same  time  the  con¬ 
trol  also  has  to  exercise  "caution"  in 
order  to  avoid  the  performance  to  suffer 
due  to  the  existing  uncertainties. 

The  connection  between  the  stochastic 
Dynamic  Programming  and  these  two  properties 
of  "probing"  and  "caution"  of  an  adaptive 
controller  is  discussed  in  Section  2. 

A  flight  control  application  in  which 
stochastic  adaptive  control  appears  to 
offer  significant  payoff  is  the  active  con¬ 
trol  of  aircraft  wing-store  flutter.  Fighter 
aircraft  are  required  to  carry  many  different 
combinations  of  external  wing-mounted  stores 
to  perform  a  variety  of  missions  over  a  wide 
operational  envelope.  Wing  mounting  of  these 
stores  gives  rise  to  different  flutter 
speeds.  Release  of  the  wing-mounted  stores 
will  cause  an  abrupt  change  in  the  damping 
and  frequencies  of  wing  structural  modes. 

The  structural  and  aerodynamic  models  used 
in  the  design  of  "constant  gain"  type  con¬ 
trollers  are  increasingly  inaccurate  for 
higher  frequency  aero-elastic  dynamics. 

Thus,  improved  flutter  suppression  could  be 
accomplished  with  an  adaptive  controller, 
which  includes  the  capability  to  learn  and 
identify  the  flutter  modes  during  the  flight 
mission. 

The  ability  to  successfully  suppress  flutter 
during  a  change  in  store  configuration  re¬ 
quires  that  the  adaptive  controller  identify 
the  structural  modes  very  rapidly.  Failure 
to  identify  the  system  parameters  quickly 
enough  could  result  in  an  instability  or 
cause  structural  damage.  For  this  reason, 
an  adaptive  control  which  provides  identi¬ 
fication  enhancement  through  probing  would 
result  in  more  rapid  identification  of 
system  parameters  than  a  heuristic  cer¬ 
tainty  equivalence  controller. 

Section  1  describes  the  flutter  model 
considered  and  simulation  results  are 
presented  in  Section  4.  It  is  shown  that 
the  CL  control,  by  anticipating  the 
learning  of  the  parameter  can  enhance  their 
identification;  i.e.,  be  "actively  adaptive." 
The  HCF.  control  is  adaptive,  but  only 
passively  so,  and  its  "accidental  learning" 

Is  not  as  fast  as  the  CL  controller's. 

2.  PROBING  AND  CALTION  IN  ADAPTIVE  CONTROL 

The  actively  adaptive  control  approach  devel¬ 
oped  earlier  in  [T1,T2,B4]  is  described  in 
this  section  and  a  decomposition  of  the 
stochastic  cost  is  presented  that  will  indi¬ 
cate  the  effect  of  the  uncertainties  on  the 
control  —  whether  it  should  be  more  aggres¬ 
sive  or  more  cautious  In  comparison  with  the 
heuristic  certainty  equivalence  (HCF.  -  when 


all  the  random  variables  are  replaced  by 
their  means).  This  algorithm  is  suboptimal 
in  the  sense  that  certain  approximations  are 
used  in  expressing  the  optimal  return  func¬ 
tion  in  the  solution  of  the  dynamic  program¬ 
ming  equation.  In  particular,  Taylor's 
series  expansions  about  some  nominal  trajec¬ 
tory,  including  second  order  terms,  are  used. 
The  convenient  and  intuitively  appealing  form 
of  the  solution,  together  with  its  computa¬ 
tional  tractability ,  however,  make  it  a  very 
useful  tool.  Only  a  brief  outline  of  the 
algorithm  is  given  to  facilitate  under¬ 
standing  of  the  stochastic  cost  decom¬ 
position  (see  [B4]  for  details). 

Consider  the  system  whose  state  x(k),  an  n- 
vector,  (which  has  been  augmented  to  Include 
unknown  parameters)  evolves  according  to  the 
equation 

x(k+l)  -  f_[k,x(k),  u(k))  +  v(k)  (2.1) 
k  «  0,1, ...N-l 

and  whose  observations  are  given  by  ^(k) ,  an 
m-vector,  according  to 

£(k)  -  h(k,x(k)J  +  w(k) ,  k  -  1 . N-l 

(2.2) 

The  initial  condition,  x(0),  is  a  random  vari¬ 
able  with  mean  x(0/0)  and  covariance  £(0/0) ; 
v(k)  and  w(k)  are  the  process  and  measurement 
noises,  with  known  statistics  up  to  second 
order.  The  cost  function  is  taken  as 

N-l 

C(N)  -  l[x(N)]  L[x(k),k]  +  <)>[u(k),k| 

k-0  (2.3) 

The  optimal  closed-loop  expected  cost-to-go 
can  be  written  as  (B4J 

JCL(N-k)  -  Jp(N-k)  +  Jc(N-k)  +  Jp(N-k) 

(2.4) 

where  j 

JD(N-k)  *  <J>[u(k),k]  +  CQ(N-k-l)  +  YQ(k+l) 

(2.5) 

is  the  deterministic  part  of  the  cost  and 

Jc(N-k)  -  •1tr[K0(k+l)  £(k+l|k))  + 

N-l 

+  'sS  tr  [l^f  j+l)V(  j)  ] 

j=k+l  (2.6) 

N-l 

yN-M  -s  £  ftyyj)  -0(j  !j)1 

j ”k+l  (2.7) 

arc  the  stochastic  terms  in  the  cost  obtained 
via  the  perturbation  problem.  In  the  above, 

V  is  the  process  noise  covariance,  -  is  the 
covariance  of  the  augmented  state  and  y,  K 


and  4  are  given  by  appropriate  recursions 
detailed  in  [B4j. 

The  first  stochastic  tern,  (2.6),  reflects 
che  effect  of  the  uncertainty  at  time  k  and 
subsequent  process  noises  on  the  cost. 

These  uncertainties  cannot  be  affected  by 
u(k)  but  their  weightings  do  depend  on  it. 

The  effect  of  these  uncontrollable  uncer¬ 
tainties  on  the  cost  should  be  minimized  by 
che  control;  this  term  indicates  the  need 
for  the  control  to  be  cautious  and  thus  is 
called  caution  term.  The  second  stochastic 
term,  (2.7),  accounts  for  the  effect  of 
uncertainties  when  subsequent  decisions 
(corrective  actions)  will  be  made.  The 
weighting  of  chese  future  uncertainties  is 
non-negative  (.-ig  xx  is  positive  semidefi- 
nite).  If  the  control  can  reduce  by 
probing  (experimentation)  the  future  up¬ 
dated  covariances,  it  can  thus  reduce  the 
cost.  The  weighting  matrix  Ag  xx  yields 
approximately  the  value  of  future  informa¬ 
tion  for  the  problem  under  consideration. 
Therefore  this  is  called  the  probing  term. 

Note  that  even  if  the  control  has  no  dual 
effect,  i.e.,  it  does  not  affect  the 
future  covariance  I  of  the  augmented  state 
(which  includes  the  random  parameters),  the 
weighting  of  these  covariances  is  still 
affected  by  the  control.  Therefore  this 
procedure  accounts  not  only  for  the  dual 
effect  but  all  the  stochastic  effects  in 
the  performance  index. 

The  benefit  of  probing  is  weighted  by  its 
cost  and  a  compromise  is  chosen  such  as  to 
minimize  the  sum  of  the  deterministic, 
caution  and  probing  terms.  The  minimization 
of  JCL  will  also  achieve  a  tradeoff  between 
the  present  and  future  actions  according  to 
the  Information  available  at  the  time  the 
corresponding  decisions  are  made. 

To  find  the  closed-loop  control  u(k),  the 
minimization  of  (2.4)  is  performed  using  a 
search  procedure.  At  every  k  to  each  control 
u(k)  for  which  (2.4)  is  evaluated  during  the 
search  there  corresponds  a  predicted  state  and 
to  this  predicted  state  a  sequence  of  deter¬ 
ministic  controls  is  attached  that  defines  the 
nominal  trajectory.  The  only  use  of  the  nomi- 
als  and  perturbations  is  to  make  possible  the 
evaluation  of  the  cost-to-go  optimized  in  a 
closed-loop  manner.  This  procedure  is  re¬ 
peated  at  every  time  a  new  control  is  to  be 
obtained. 

If  the  uncertainty  dominates  the  problem  then 
one  can  distinguish  two  cases:  (1)  The  cau¬ 
tion  component ,  Ig,  dominates.  Then,  since 
this  is  "uncontrollable"  uncertainty,  one  has 
a  highly  uncertain  model  which  cannot  he 
improved  in  the  course  of  the  control  period. 
(2)  The  probing  component,  Jn»  dominates. 

Then,  with  the  dual  effect  of  the  control  one 
can  reduce  the  uncertainty  of  the  model  -  thus 
the  model,  while  uncertain  at  the  beginning, 
might  prove  to  be  ultimately  adequate  for  the 
control  problem  under  consideration.  A  third 
case  occurs  when  (1).  The  deterministic  compo¬ 
nent  of  the  cost,  Ijj,  dominates :  Chen  the 


parameter  uncertainties  are  of  no  significant 
consequence.  This  is  the  most  desirable  sit¬ 
uation  because  then  we  can  use  CE,  i.e., 
least  expensive,  control  algorithm  with  good 
performance.  However,  only  the  stochastic 
control  approach  can  indicate  this. 

3.  A  SIMPLIFIED  WING-STORE  FLUTTER  MODEL 

A  simplified  version  of  a  wing  score  flutter 
model  can  be  represented  by  a  second  order 
differential  equation.  The  state  space  model, 
with  position  and  velocity  components,  can  be 
written  as 

Hi*  ‘MM  — 


with  measurements  of  velocity  only 

y  =  ( 0  1]  x  +  w  (3.2)  j 

I 

where  v  and  w  are  the  process  and  measurement 
noises,  respectively. 

Typical  values  of  the  parameters  for  model 
(3.1)  are  ua0  >  20  +  10,  ;  -  0.05  +  0.1  (it  can 
become  open-loop  unstable)  and  <  ■  1+0.9 
(the  control  gain  can  become  very  lowT. 

A  more  general  flutter  model  would  Include  a 
lead-lag  transfer  function  between  control 
input  and  input  u  of  model  (3.1).  However, 
the  simplified  model  (3.1)  is  sufficient  to 
demonstrate  the  adaptive  control  concept  of 
improved  control  by  identification  enhance¬ 
ment  . 

The  discretized  version  of  (3.1)  is,  for 
sufficiently  high  sampling  rate  (typically 
ten  times  its  natural  frequency) 


1-2;jqAT 


r 0  i  rvkn 

i  u(k)  + 

UatJ  lv2(k) 


where  v(k)  is  a  zero-mean  white  noise  sequence. 

For  ug  *  20  one  has  f  »  20/2t  c  3.2,  T  '  0.3  and 
the  sampling  time  was  chosen  as  AT  *  0.03.  The 
nominal  parameters  of  the  discrete  time  model 
are  then 


°1  -  -“VT  -  -12 

02  -  1-2;~0AT  *  0.94 
0,  =*  <AT  -  0.03 


The  augmented  state  model  consists  of  (3.3)  and 
the  model  for  the  parameters  with  additive  zero- 
mean  white  noise 

O^k+l)  -  Uk)  +  vJ+2(k)  i*l,  2, 3  (3.5) 


f.yo-'-.y.y.y 


A')':  VA-. 


i.e.  the  parameters  were  assumed  to  behave  (over  control  value  u"c^  (0)  and  at  the  value  ob- 
the  relatively  short  horizon  of  the  problem)  as  tained  by  minimizing  (2.4),  u^*  (0).  In 
Wiener  processes.  This  was  done  to  allow  for  cases  1  and  2,  with  moderate  uncertainties 

the  changes  that  occur  in  the  flutter  dynamics  in  02  (damping)  and  (input  gain)  the 
during  the  flight.  three  cost  components  -  deterministic,  prob¬ 

ing  and  caution  -  are  of  approximately  the 

The  initial  estimate  for  the  augmented  state  was  same  magnitude.  The  minimum  of  the  closed- 


x(0|0)  =■  [0  10  -12  .94  0.03]'  (3.6) 

with  che  covariance  matrix  assumed  diagonal 

I(0| 0)  -  diag[10‘2  1  36  o2  ^2  ]  (3.7) 

u2  u3 

The  last  two  terms,  reflecting  the  damping 
and  input  gain  uncertaintiy,  can  take  a 
number  of  values. 

The  process  noise  covariance  was 

V  -  diag  [0,  10'2,  0,  V44  V55l  (3.8) 

The  terms  V44  and  V55  were  non-zero  in  the 
runs  where  the  effects  of  time-varying 
damping  and  control  gain,  respectively,  were 
investigated . 

The  flutter  control  problem  can  be  repre-  | 
sented  as  the  minimization  of  a  quadratic 
cost  criterion 

(A  2  ) 

J  »  E  2^  x’(k)Q(k)x(k)  +  ru  (k-1)  (3.9) 


with 

'.001  0 1  I 

Q(k)  =  ;  r-1  (3.10) 

.  0  .1  J 

where,  N  is  chosen  to  reflect  the  desired  j 
sample  duration  during  the  store  configura-  j 
tion  change.  For  the  problem  here  N=5  was 
chosen.  As  indicated  by  (3.10)  the  goal  is 
to  keep  the  velocity,  X2.  small  with  limited 
amounts  of  control. 

4.  SIMULATION  REUSLTS 

The  flutter  model  of  (3.3)  and  (3.4)  was 
investigated  with  nominal  parameter  values 
shown  in  (3.6),  (3.7)  and  (3.8).  Two  con¬ 
trollers  were  evaluated:  (1)  the  closed 
loop  control  ul  which  minimizes  the  quad¬ 
ratic  cost  (3.9)  and  assumes  uncertainty  in 
the  flutter  parameters  and  (2)  the  Heuristic 

U/T 

Certainty  Equivalence  control  u  which 
assumes  the  flutter  parameters  arc  known 
without  error.  The  case  of  t ime- invar iant 
parameters  is  shown  first,  followed  by 
assuming  the  parameters  vary  with  time 
(Wiener  Process)  as  shown  in  (3.5). 

The  first  set  of  simulations  consisted  of 
the  evaluation  of  the  first  period  cost 
decomposition  presented  in  the  previous 
section  for  t lme- invariant  parameters. 

Table  4.1  presents  the  results  in 
terms  of  the  cost  components  evaluated 
at  the  Heuristic  Certainty  F.quivalence 


loop  cost  is  very  close  to  the  HCE  control, 
which  minimizes  only  the  deterministic  cost 
(because  HCE  ignores  all  uncertainties). 

For  larger  uncertainties  in  the  damping  the  I 
caution  component  increases  but  the  reduc¬ 
tion  in  the  probing  component,  with  a  larger 
magnitude  control  Jul'L •  >  j  uH^  |  ,  yields  a 
small  reduction  of  the  total  cost.  Case  5 
considers  the  situation  where  the  gain  uncer¬ 
tainty  is  very  large.  This  situation  leads 
to  a  significant  dominance  of  the  caution 
effect  -  the  magnitude  of  the  CL  control  is 
significantly  smaller  than  che  HCE  control. 

The  significance  of  the  results  presented  in 
Table  4.1  for  the  flutter  problem  (assuming  a 
time-invariant  parameter  description)  is  as 
follows.  For  large  uncertainty  in  the  damping 
parameter  O2)  the  performance  of  both  the 
u«-  and  the  uHCE  controller  is  nearly  the  same. 
However,  if  in  addition,  the  control  gain  has 
large  uncertainty  (case  5)  the  CL  control 
shows  a  7X  reduction  in  the  cost  -  all  of 
which  is  due  to  the  caution  component.  This 
implies  that  uncertain  knowledge  of  the  con¬ 
trol  gain  dictates  that  the  optimal  control 
should  exhibit  more  caution  than  the 
Heuristic  Certainty  Equivalence  controller. 

The  second  set  of  simulations  was  performed  for 
a  time-varying  description  of  the  flutter  para¬ 
meters.  A  time-varying  parameter  case  was 
simulated  by  assuming  there  is  process  noise  in, 
(3.5)  for  i”3.  The  standard  deviation  of  the 
noise  affecting  the  input  gain  was  taken  as 
1^55  “  0.014.  The  results  are  shown  in  Table 
4.2  for  different  values  of  the  initial  damping 
uncertainty.  As  can  be  seen  probing  dominates: 
a  significant  reduction  in  the  probing  cost 
and  a  102  reduction  in  the  total  cost  can  be 
obtained  by  using  an  actively  adaptive  control 
like  u^-L.  This  control  anticipates  that 
changes  will  occur  in  the  parameter  even  though 
it  does  not  know  what  will  be  the  changes, 
which  are  modelled  by  zero-mean  noise  with 
variance  V55,  according  to  (3.5).  Conse-  | 

quently,  this  "anticipation"  (which  is  re¬ 
stricted  to  be  causal)  leads  the  control  to 
enhance  the  identification  of  the  input  gain, 
whose  variance  otherwise  would  be  excessively 
large. 

The  results  in  Table  4.2  demonstrate  that 
flutter  suppression  can  be  more  effectively 
achieved  bv  probing  the  svstem  to  enhance 
identification  of  the  control  gain  for  the 
case  where  the  control  gain  can  vary  with  time. 
The  results  of  Table  4.2  indicate  the  average 
performance  improvements  by  using  the  CT- 
controller.  Specific  time  history  results 
can  give  a  detailed  examination  of  the  identi¬ 
fication  enhancement  property  of  the  CL- 
control . 


The  next  set  of  simulations  consists  of  time 
history  runs  with  time-varying  parameter  as 
in  case  7.  The  true  value  for  the  gain  was 
0j*0.03.  The  process  noise  vjd)  simulated 
the  change  of  03  from  time  1  to  time  2.  The 
goal  was  to  see  how  the  probing  control  as 
shown  in  case  7  (Table  4.2)  was  able  to 
enhance  the  real-time  parameter  identifica¬ 
tion  in  order  to  reduce  the  cost.  An  exact 
assessment  of  the  potential  benefits  from 
using  u^L  vs.  u»CE  WOuld  involve  many  Monte 
Carlo  runs  where  all  the  random  variables 
(initial  conditions,  parameters,  noises)  have 
to  be  generated  according  to  their  statisti¬ 
cal  characterizations  (B5]  and  the  results 
require  special  analysis  [Ml).  A  few  runs 
cnlv  were  carried  out  with  only  the  noise 
v5(1)-;'3(2)-03(1)  being  non-zero  while,  all 
the  other  noises  were  set  to  zero,  to  eval¬ 
uate  the  cumulated  cost  over  N*5  steps. 

Table  4.3  shows  these  values  for  the  two 
control  policies  for  a  feu  parameter  changes. 
In  cases  8-10  the  initial  estimate  of  the 
input  gain  was  the  same  as  the  true  value, 
i.e.,  )3(0)  ”  03(0)  ■  0.03.  In  this  situa¬ 
tion,  which  initially  favors  the  HCE  con¬ 
troller,  the  CL  controller  is  still  better 
when  the  gain  decreases  (cases  9  and  10). 

Note  that  this  decrease  of  the  control  gain 
causes  significant  cost  increases  and  this  is 
when  the  CL  controller  proves  itself  useful. 

In  cases  11  and  12  the  initial  gain  estimate 
was  ;3(0)  -  0.05,  i.e.,  it  was  overestimated. 

The  final  set  of  simulations  represent  time 
histories  where  both  the  damping  parameter 
~)i  and  the  control  gain  ,3  experience  abrupt 
changes.  This  would  be  typical  of  a  wing 
store  configuration  change.  The  damping  and 
control  gain  change  are  shown  in  Fig.  4.1. 

For  this  case  the  damping  parameter  (0£ ) 
goes  from  a  stable  value  of  .94  to  an 
unstable  value  of  1.06.  The  control  gain 
(j3)  goes  from  .03  to  .005.  The  standard 
deviation  of  the  noise  for  the  damping  para¬ 
meter  was  »'V^  *  .1. 

The  cumulated  cost  for  this  case  is  shown  in 
Table  4.4  for  u^  and  u^^. 

The  CL  control  is  seen  to  have  improved  the 
performance  over  the  HCE  controller.  This 
improved  performance  is  due  to  identifica¬ 
tion  enhancement  of  the  damping  parameter. 

This  can  be  seen  in  Figure  4.2  where  the  CL 
control  is  shown  to  identify  the  damping  para¬ 
meter  more  rapidly.  Figure  4.3  shows  the 
identified  gain  parameter  which  is  success¬ 
fully  identified  by  both  controllers  after  5 
time  steps. 

5.  CONCLUSION 

The  simulation  results  resented  in  this 
paper  indicate  that  pounlial  improvement 
in  flutter  suppression  is  possible  using 
an  adaptive  control  of  the  closed  loop  type. 
This  improvement  is  a  direct  result  of 
identification  enhancement  due  to  probing  in 


the  control  solution.  A  more  detailed 
flutter  model  and  further  simulation  is  re¬ 
quired  to  fully  quantify  the  maximum 
achievable  performance  capability  using  the 
CL  control. 
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Resumd. 

On  pr^sente  une  methode  pour  la  construction  d’un  algocithme  de  commande 
duale  ayant  une  structure  a  retroaction  lineaire.  L'application  de  cet 
algorithme  pour  la  cotanande  d'un  helicoptere  est  discutee  et  des  resultats 
de  simulation  sont  donnes. 


Abstract 

The  methodology  for  deriving  a  dual  control  algorithm  that  has  a  linear 
feedback  form  is  presented.  This  control,  while  simple,  has  the  capability 
of  enhancing  the  identification  of  the  system's  unknown  parameters.  A  dual 
controller  for  a  plant  describing  the  helicopter  higher  harmonic  vibration 
control  problem  is  presented  together  with  simulation  results. 


1.  Introduction 

In  the  control  of  nonlinear  stochastic  systems  the  control  has,  in 
general,  a  dual  effect  [FI,  B 1 ] :  it  affects  the  system's  state  as  well  as 
its  uncertainty.  Since  in  linear  plants  with  unknown  parameters  the  con¬ 
trol  has  a  dual  effect,  it  can  be  potentially  used  to  enhance  the  real-time 
Identification  of  the  system  parameters. 

The  attractiveness  of  a  linear  controller  that  incorporates  the  dual 
effect  has  been  pointed  out  in  [Ml].  Previous  dual  control  algorithms 
[A2,  B2,  Wl,  W2]  required  numerical  search  which  makes  their  implementation 
costly.  The  success  of  the  self-tuning  regulator  [Al],  which  stems  from 
its  ease  of  implementation  as  well  as  its  effectiveness,  prompted  us  to 
investigate  control  algorithms  that  have  a  linear  feedback  form  but  incor¬ 
porate  the  dual  effect. 

The  problem  considered  in  Section  2  is  the  simplest  one  where  there 
is  a  dual  effect,  in  order  to  illustrate  the  concept.  A  2-stage  optimi¬ 
zation  problem  is  then  formulated  with  the  stochastic  dynamic  programminc 
in  Section  3  and  the  controller  is  derived  in  Section  4. 

In  Section  5  an  algorithm  based  on  this  methodology  is  derived  for 
a  multiple-input  multiple-output  model  corresponding  to  a  simplified 
version  of  the  "higher  harmonic  control"  of  helicopter  vibration  [Wi,  M.'  ] . 
Simulation  results  are  presented  in  Section  6. 
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2.  Problen  Formulation 

The  following  memoryless  unknown-gain  system  with  plant  and  measure 
ment  noises  is  considered.  The  plant  equation  is 

'  x(k+l)  -  bu(k)  +  v(k)  (2.1) 


Ev(k) v( j  )  -  V  6. 


EV(k)  -  0  ;  Ev 

"J-  *• 

and  the  measurement  is  given  by 
y(k)  -  x(k)  +  w(k) 


Ew(k)  -  0  ;  Ew(k)w( j )  =  W  6kj  (J.  .1 

and 

Ev(k)w( j )  *  0  (2.  )) 

The  estimation  of  the  unknown  gain  b  (assumed  time  Invariant  here) 
done  according  to  the  following  equations: 

b(k+l)  -  btk)  +  P(k)u(k)  (P(k)u2(k)  +  V  +  W]_1(y(k+1)  -  b(k)u(k)]  (2.h> 
P(k+1)  -  Elb  -  b(k+l)  ]2  -  P(k)(V  +  W)  [P(k)u2(k)  +  V  +  W]_1  (2.7) 

Note  in  (2.7)  the  fact  the  control  affects  the  variance  of  the  para 
meter  estimate,  i.e.  ,  it  has  the  dual  effect  [FI,  Bl). 

The  control  criterion  to  be  minimized  will  be  taken  as  the  expected 
value  of  the  cost  from  step  0  to  N 

J(0)  -  E{C(0) }  <2. 


C(k)  -  £  c[J ,x(J),u(J) ) 
j-k 

and,  with  £(j)  denoting  the  desired  state  at  time  j 

c(j)  =  q(J)[x(j)  -  CU))2  +  ru2(j)  j-0,1 . N-l 

c (N)  -  q (N) [x(N)  -  C(N)  ]2 

3.  The  Multistage  Problem  and  Dynamic  Programming 


The  general  equation  of  the  Stochastic  Dynamic  Programming  i  > 


x  If  *  1(4. 1  It 

J  (k,Y  )  -  min  E[c(k)  +  J  (k+1  ,  Y  )|Yk]  k-N-1, 


1,0  (  1.  !  i 


A  k 

where  J  (k)  is  the  "cost-to-go"  from  k  to  N,  Y  is  the  cumulated  in'  rm 

at  time  k  when  the  control  ti(k)  is  to  be  determined. 

Due  to  the  memoryless  nature  of  the  system  (2.1)  the  onlv  upline. 

between  the  stages  in  a  multistage  problem  is  the  informational  otic,  t  ■ 

the  control  -  its  effect  on  the  quality  of  the  estimate  of  the  pit  mt.  r 

The  last  control  is  obtained  from 

J*(N-1)  -  min  E{q(N-l)[x(N-l)-f.(N-l))2+ru2(N-l)+q(N)[\(N>-  IN'  1  vv  ’  ■ 


(3.2) 


"  u(N-l)  {E[q<N_1>  lx<N-i)-C<N-l)  ]2  |  YN-1]-*-[r-Ki  (N)  [b2(N-I)  + 

+  P(N-l)]]u2(N-l)-2q(N)f;(N)b(N-l)u(N-l)+q(N)  [V+f,2(N)  ] 
as 

u*(N-l)  -  {r+<i(N)[b2(N-l)+P(N-l)]}-1q(N)C(N)b(N-l)  (3.  i) 

This  yields  the  optimal  cost-to-go 

J*(N-1)  -  E[q(N-l)  (x(N-l)  -  £(N-1) ) 2 | YN_1 ]  +  J*(N-1)  (3.4) 

where 

J*(N-1)  —  [r+q(N)(b2(N-l)+P(N-D)  ]_t(q  (N)  t(N)b  (N-l)  )  24q  (N)  52(N)+q  (N)  V 

-  (r+q(N)(b2(N-l)+P(N-l))  ]_i(r-K!(N)P(N-l))q(N)C2(N)+q(N)V  (3.5) 

is  the  cost-to-go  excluding  the  term  which  is  not  affected  by  the  current 
control. 

The  control  (3.3)  is  the  well-known  "one  step  ahead  cautious"  control. 
This  is  the  optimal  control,  for  all  k,  if  the  cost  has  a  sliding  horizon 
of  only  one  step  (called  also  "myopic"  control). 

The  next  to  the  last  control  is  to  be  obtained  from  the  following 

J*(N-2,YN-2)  -  min  E{c(N-2)+J*(N-l,YN_1) | YN“2 }  (3.6) 

u(N-2) 

* 

The  dependence  of  J  (N-l),  given  by  (3.4),  on  y(N-l)  is  via  b(N-l). 
Since,  as  detailed  in  (3.5),  J  (N-l)  is  a  rational  function  of  b(N-l)  one 

cannot  carry  out  explicitly  the  expectation  in  (3.6),  which  is  over  y(N-l) 
N~2 

conditioned  on  Y  .  Even  if  one  could  carry  out  explicitly  this  expect  a- 

* 

tion,  the  dependence  of  the  cost-to-go  J  (N-l)  on  the  previous  control 
u(N-2)  via  P(N-l)  poses  a  significant  problem:  the  minimization  of  (3.6) 
would  require  solving  a  high  order  algebraic  equation.  This  can  be  seen 
as  follows. 

Assume  that  b(N-l)in  J(N-l)  given  by  (3.4),  (3.5)  would  be  replaced 

by  b(N-2),  the  estimate  at  the  time  u(N-2)  is  to  be  computed.  This  removes 

*  N-2 

the  need  to  carry  out  the  expectation  of  J  (N-l)  conditioned  on  Y  in 
(3.6).  Then  (3.6)  becomes  an  explicit  function  of  u(N-2)  and,  as  shown  In 
Stemby  [SI],  the  derivative  w.r.t.  u(N-2)  leads  to  a  fifth  order  polynonia 
Thus  the  two  main  problems  in  performing  the  first  backward  iteration 
of  the  Stochastic  Dynamic  Programming  as  given  in  (3.6)  are  the  conditional 
expectation  over  the  future  measurement  and  the  minimization.  In  the 
Linear-Quadratic  Problem  the  presence  of  quadratic  and  linear  terms  (as 
opposed  to  rational  functions  here)  made  possible  an  easy  solution  for  the 
optimal  control.  The  resulting  solution,  in  the  form  of  a  linear  feedback 
control  has  been  in  wide  usage  because  of  its  ease  of  implementation.  On 
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the  other  hand,  the  linear  problem  with  unknown  parameters  is  encountered 
in  many  applications  and  it  is  desirable  to  obtain  (and  evaluate)  a  dual 
controller  which  has  the  linear  feedback  form.  The  gain  should  in  this 
case  depend  on  the  current  as  well  as  the  expected  future  parameter  un¬ 
certainties. 

4.  A  Linear  Feedback  Dual  Controller  with  a  Two-Step  Horiton 

The  cost-to-go  given  in  (3.5)  depends  on  the  following  variables: 

J*(N-1)  -  J*[K-l,b2(N-l),  P(N-l)  ]  (4.1) 

The  first,  b  (N-l),  the  estimate  squared  of  the  parameter  at  N-l,  will  have 

N“2 

to  be  "averaged  out"  conditioned  on  Y  .  The  second,  P(N-l)  depends  dir¬ 
ectly  on  u(N-2),  which  is  to  be  determined  from  (3.6). 

The  following  first  order  series  expansion  of  (4.1)  is  proposed 

J*(N-1)  =  J*IN-1,  b2(N-2),  P(N-l)  +  H -ffl-1!  [b2(N-l)  -  b2 (N-2)  ] 

3b  (N-l) 


/tf  1\  2/«| 


[u2 (N-2)  -  u2(N-2) J 


3P(N-1)  3u  (N-2) 

In  other  words,  the  expansion  is  about  the  current  estimate  of  the 
parameter,  b(N~2),  and  a  "nominal"  variance  for  this  parameter  P(N-l), 
given  by 

P(N-l)  -  P(N-2)  (V-Hf)  [P(N-2)u2(N-2)  +  V  +  W]-1  (4.3) 

where  u(N-2)  ia  a  "nominal"  control  at  N-2. 

The  following  notations  are  introduced 

J(N-l)  -  J*[N-1,  bZ(N-2)  ,  P(N-l)  ]  (4.4) 


Jb(N-l) 


A  3J  (N-l) 
3b  2  (N-l) 


Jp(N-D 


4  3J  (N-l) 
3P  (N-l) 


P  (N-l)  -  iP4N-^  (4.  7) 

u  Ju‘(N-2) 

where  the  partial  derivatives  (4.5)-(4.7)  are  evaluated  at  the  same  nominal 
values  as  (4.4).  Note  that  (4.5)and  (4.6)  are  the  sensitivities  of  the 
cost-to-go  w.r.t.  the  parameter  and  its  uncertainty,  respectively;  (4.7)  I; 
the  sensitivity  of  the  parameter  uncertainty  w.r.t.  the  control.  With 
these  notations  (4.2)  can  be  written 


- '-'i ‘ ' * 

W fr'&foiB «.£■  >X^KV«  „  .  V '  ‘  .  r-'-JV^  -lL-.  ; 
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J*(N-1)  -  J (N-l)  +  3b(ti-l)  •  [b2(N-l)  -  b2(N-2)]+jp(N-l)Pu(N-l)  • 

•  1u2<N-2)  -  u2 (N-2) ]  (4.8) 

The  asterisk  on  the  cost,  symbolizing  optimality,  has  been  kept  even 
though  (4.8)  is  only  an  approximation  to  the  optimum. 

When  inserting  (4.8)  into  (3.6)  its  expected  value  conditioned  on 

N— 2 

Y  will  have  to  be  computed.  Note  that  only  the  second  term  on  the  r.h.s 

N— 2 

of  (4.8)  is  random  when  conditioned  on  Y  .  Its  conditional  expectation  i 
E{Jb(N-l)  Ib2(N-l)  -  b2(N-2))|YN*2} 

-  5b(N-l)  I  E  {b2(N-l)  |YN_2)  -  b2 (N— 2)  ] 

-  Jb(N-D  I  P(N-2)  -  P(N-l)  ]  (4.9) 

Notice  the  fact  that  u(N-2)  enters  into  (4.9)  via  P(N-l).  A  first 
order  expansion  of  P(N-l)  about  its  nominal  value  (4.3)  will  be  used  in 
(4.9).  Using  notation  (4.7)  one  replaces  (4.9)  by 

E  {Jb(N-l)  •  [b2 (N-l)  -  b2(N-2)  )  |  YN_2) 

=  3.  (N-l)  •  [P(N-2)  -  P (N-l)  -  P  (N-l)  •  (u2(N-2)-u2(N-2)  ]  (4.10) 

b  u 

The  (approximate)  conditional  mean  of  (4.8),  becomes,  using  (4.10) 

E  (J*(N-1)  |  YN~2}  -  J  (N-l)  +  3b(N-l)  •  [P(N-2)  -  P(N-l)] 

+  [Jp(N-l)  -  3b(»-l)]  PU(N-D  •  fu2(N-2)  -  u2(N-2)  ]  (4.11) 

Combining  (4.11)  and  (3.4)  into  (3.6)  yields 

J*(N-2)  -  min  {E  (q(N-2)  (x(N-l)  -  £(N-1))2+  ruZ(N-2)  +  q  (N-l)  (x(N-l)  - 
u(N-2) 

-  C(N-1))2|  Y**'2]  +  J  (N-l)  +  3s(N-1)(P(N-2)  -  P(N-l)]  + 


+  (j_(N-l)  -  3.  (N-l)]  P  (N-l) (u2(N-2)  -  u2 (N-2) ]} 

r  DU 


(4.12) 


Ignoring  the  terms  in  (4.12)  that  are  independent  of  u(N-2)  yields 
u*(N-2)  -  arg  min  {q(N-l)E[(x(N-l)  -  t(N-l))2  |  YN_2J 
+  [r  +  (J0(N-1)  -  3  (N-1))P  (N-l))  u2(N-2)  } 

r  DU 

-  arg  min  {  (r+q(N-l)  Ib2(N-2)+P(N-2)  ]+l  Jp(N- l>-3fe (N- 1)  ]Pu(N-l)  ]u2(N-2) 


2q(N-l)C(N-l)b(N-2)u(N-2)  } 


which  gives  the  control  as 
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u*(N-2)  -  [r  +  q(N-l)(b2(N-2)  +  P(N-2)  +  (Jp(N-l)  -  3  (S-l ) )P  (N- 1 > f ’ 

q(N-l)C(N-l)b(N-2)  (4.1,1 

Note  the  presence  of  the  caution  effect  above  -  the  additive  P(N-.') 
the  denominator,  which  being  positive,  will  tend  to  decrease  the  contr'l 
magnitude.  However,  the  last  term  in  the  denominator  is  negative  refl-  <■: 
ing  the  probing  effect  via  the  sensitivity  functions  (4.5  -  4.7)  and  thu: 
will  tend  to  Increase  the  control.  This  can  be  seen  as  follows:  3  (N-l 
is  positive  (since  the  cost  increases  with  uncertainty),  3  (N-l)  is  nr-c  < 
tive  (this  follows  from  inspection  of  (3.5)),  and  P  (N-l)  is  negative  1 t ! 
follows  from  inspection  of  (2.7)). 

The  resulting  control  has  thus  the  linear  feedback  form  with  the  r  : 
modified  by  the  caution  and  probing  effects. 

5.  Extension  to  Multiple  Input  Multiple  Output  Model 
The  plant  model  is 

x(k+l)  =  jc  +  B  u(k)  +  v(k)  ('.  ') 

with 

E  v(k>  *  0  ;  E  v(k)v'(j)  -  V  6RJ 

where  £  is  an  unknown  vector,  B  a  matrix  with  unknown  parameters.  The  ,r 
known  elements  of  £  and  B  are  denoted  as  J)  with  covariance  matrix  P.  In 
the  helicopter  vibration  problem  to  be  considered  later  £  is  the  ar.pl  it 
of  uncontrolled  vibrations.  The  matrix  B  is  called  the  "transfer  matrix’ 
(M2]  and  represents  the  effect  of  the  control  on  the  vibration  anplitn  !■■ 
The  measurement  is  given  by 

£(k)  =  x(k)  +  w(k)  (  >.  * 

whe  re 

E  w(k)  =  0  ;  E  w(k)w'(j)  -  W  5Rj  < •' 

E  v(k)w’(j)  =  0 

The  control  criterion  to  be  minimized  is  the  expected  value  o‘  ! h 
cost  from  step  0  to  N 

N 

J(0)  -  E{C(0) }  -  E{  l  x'(k)qx(k)  +  u’(k-l)Ru(k-l)  •  ('>•'" 

k=i 

The  last  control  is  easily  obtained  by  minimizing  l(N-l)  and  i  r!v 

u*(N- 1)  -  -  [R  +  E(B,qB|YN",)l”1  ECB’qclY^1'1)  <  - .  '< 

* 

Thus  Inserting  (N-l)  In  the  cost  we  get 
J*(N- 1)  -  F.(£'Qc|YN_1)  +  tr(QV) 
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the  vibration  occurring  in  the  airframe.  The  relationship  between  vibration 
output  and  higher  harmonic  control  input  is  known  to  be  nonlinear  and  thus 
adaptive  control  solutions  are  required.  In  such  cases  fixed  gain  feedback 
controllers  perform  poorly.  A  simplified  linear  version  of  this  problem 
(for  two  vibration  components)  can  be  represented  by  the  plant  equations 
IW3] 

x^k+1)  -  01  +  BjUjCk)  +  63u2(k)  +  v3(k) 

x2(k+l)  -  0^  +  Bju^k)  +  06u2(k)  +  v2(k)  (6.1) 

with 

E  v(k)  v'(k)  -  V  -  diag  (V^)  ;  V^S2  ,  V2*4402  (6.21 

The  first  state,  x^  represents  the  rotor  hub  force  amplitude  at  a 

given  frequency  (one  of  the  harmonics  of  the  rotor  r.p.m.),  the  second 
state,  x2>  represents  the  rotor  blade  bending  moment  amplitude  at  the  same 
frequency.  The  two  controls  are  the  "higher  harmonic  controls".  These 
controls  excite  the  rotor  blades  at  higher  harmonics  of  rotational  speed. 
These  cancel  out  some  of  the  existing  unsteady  air  loads  [Cl]. 

The  measurements  are 

yj(k)  -  xx(k)  +  wx(k) 

y2(k)  -  x2(k)  +  w2(k)  (6.3) 


with 


(6.4) 


E  w(k)w'00  -  W  -  diag  (W^)  ;  Wj-282  ,  W.,-4402 

2 

The  initial  parameter  estimates  are  generated  as  NOj.Sj).  i*l,...,6 


where  the  true  values  are 
9X  -  287.3 


0.  -  4410 
4 


0, 


0, 


-25.1 

14.4 


6c 


0c 


'3  “6 

The  cost  weighting  matrices  are 


-32.5 


-54.0 


Q  «  diag  (qltq2)  ;  q2 


R 


diag  :  ri 


10 


10 


,  r2  -  10 


5  x  10 
-4 


-8 


In  terms  of  the  notation  of  Section  5 

9, 


0, 


B  - 


92  93 

05  06 


u(k) 


Uj(k) 

u2(k) 


(6.5) 


(6.61 


(6.7) 


The  parameter  vector  to  be  estimated  is 


"  '‘i  A.  .  ../*  vV*'*,t  •!’«-•'  *’'•  *  :.:.  v-.-  :  v;  V. .  'i  ?r  ■*■  ,  ..■  • 


*^;-  *  ;  r-  X’l  V,v- .'■;’..V,^U*^--  i 
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6(k)  -  [8X  02  e3  04  e5  e6r 


And  ic  is  modelled  as  time  invariant 
0(k+l)  -  000 
with  measurements 


y1(k)  -  H(k)  [0X  02  03)'  +  Vj(k)  +  Wj(k) 
y2(k)  -  H(k)  [04  05  66] ’  +  v2(k)  +  w2(k)  (6.1 

where 

H(k)  -  [  1  Ul(k)  u2(k)]  (6.1 

In  view  of  (6.2)  and  (6.4)  the  covariance  matrix  of  ()(k)  is  block 
diagonal 

f  P3(k)  0 

POO  -  I  (6.12) 


The  optimum  cost  for  stage  1,  assuming  it  is  the  last  one,  is 
J*(l)  -  E(c'Qc  |  Y1)  +  tr(QV) 

-  E(c’QB  |  Y1)  IR  +  E(B'QB|Y1))-1  E(B'Qc | Y^)  (6.13) 

The  above  can  be  rewritten  as 

J*(l)  -  q3(0^  +  Pltl(l))  +  q2(04  +  P4f4(l))  +  »i  •  Vi  +  <12  •  v2 

-  — (FZD  -  2FCE  +  G2C)  (6.14) 

CD-E 


C  -  q1(02  +  P2f2(l))  +  q2(0j  +  Pjj5(l))  +  r3 

D  *  V°3  +  P3,3(1))  +  «2(*6  +  P6,6(1»  +  r2 
E  -  q^  +  P2>3(1))  +  q2(0506  +  P5>6(1)) 

F  -  11(8ie2  +  Plf2(D)  +  q2(0>'5  +  P4>5(1)) 

g  -  q1(e103  +  p1j3(D)  +  <?2<V6  +  p4,6(1))  <6-15) 

The  terms  Jp(l)  are  easily  obtained  from  equation  (6.14).  The  co- 
variance  update  equation  is 

Pi(k)-P1(k-1)  -  P1(k-l)H’(k)[H(k)Pi(k-l)H’(k)+Vi-W1]"1  H(k)Pi(k-l)  (6. 16) 

1-1.2 

The  nominal  covariance  P^k)  is  obtained  in  terms  of  previous  P.^k-1) 
and  a  nominal  control  ii(k-l)  of  the  "1  step"  type. 

The  sensitivity  term  Pu(l)  can  be  evaluated  from  the  above. 

The  two-step  dual  control  (5.14)  was  implemented  for  the  above 
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problem  with  a  "sliding  horizon"  for  a  total  of  20  steps.  The  evaluation 
criterion  is 

N 

Z  x’OOQxOO  , 

k-1 

Performance  was  evaluated  from  100  Monte  Carlo  runs  for  the  following 
cases : 

1.  Heuristic  Certainty  Equivalence, 

2.  One  step  ahead  optimal  stochastic  cautious  myopic, 

3.  Two  step  dual 

4.  Modified  two  step  dual. 

~  2 

The  above  runs  were  made  for  the  case  0^(0)  "v  W(6  ,0^)- 

Comparisons  are  made  between  the  performances  of  the  cautious  and  dual 
algorithms  on  the  system  and  a  conventional  statistical  significance  analy¬ 
sis  is  done  using  the  normal  theory  approach  [N1,W1],  The  methodology  is 
given  in  Appendix  A.  Tables  I  &  II  contain  the  results  of  the  simulation 
runs.  Table  II  indicates  that  the  dual  control  performs  better  than  the 
other  controllers  over  10  time  steps.  Table  I  provides  a  rigorous  argu¬ 
ment  that  the  dual  outperforms  the  other  controllers. 

The  performances  are  compared  in  Figures  1-3.  In  Fig.  1  the  HCE  con¬ 
troller  uses  a  very  large  control  magnitude  and  drives  the  system  hard. 

Thus  in  step  1  the  vibration  is  Increased  compared  to  the  cautious  and 
dual  controllers.  This  however  helps  to  learn  the  parameters  faster  and 
reduces  the  vibration  earlier  than  the  others.  In  a  realistic  situation 
one  cannot  really  live  with  a  HCE  because  of  the  practical  bounds  on  the 
control.  The  dual  starts  off  higher  than  the  cautious  but  behaves  better 
after  2  steps. 

Fig.  2  compares  the  cautious,  dual  and  modified  dual  algorithms.  As 
8  increases  from  0  to  6  the  vibration  at  step  1  increases.  Values  of  8 
from  3  onwards  do  not  behave  very  much  better  than  8*2  beyond  step  3.  Tims 
8“0,1,2  are  suggested  for  implementation  and  the  statistical  tests  were 
performed  only  for  these  values.  Fig.  3  compares  the  cautious  and  dual 
over  a  wider  scale. 

Single  Time  History  Runs 

Results  of  single  time  history  runs  over  20  time  steps  are  plett«d  ir 
Figs.  4-6  for  the  HCE,  dual  and  cautious  controllers.  Fig.  4,  S,  and  6 
compare  the  controls  Ul,  U2 ,  and  cost  for  the  three  cases  respect Ivr I v. 

For  all  the  controllers  the  controls  Ul,  U2  reach  almost  the  same  value  at 


the  end  of  20  steps,  although  they  start  differently  indicating  that  the 
algorithms  have  learned  the  parameters.  As  a  trade-off  between  the  rapid 
learning  and  smaller  cost,  the  dual  is  the  best  of  the  three. 


•  r 


Algorithms  Compared 


Cautious  myopic  -  Dual 

(5-1) 

Cautious  myopic  -  Dual  (3= 

Time  Step 

Test  Statistic 

Estimated 

Time 

Test 

Est  inuited 

k. 

7 

Improvement 

Step 

Statistic 

Improvement 

EIkU) 

It 

Zk 

EIk(2) 

1 

-2.30 

-7.19 

1 

-2.83 

-20.64 

2 

-0.36 

-1.90 

2 

-0.21 

-  3.2 

3 

1.26 

4.79 

3 

0.37 

3.19 

4 

5.28 

19.56 

4 

5.32 

32.22 

5 

3.53 

23.21 

5 

7.94 

44.48 

6 

5.43 

34.20 

6 

6.49 

47.63 

7 

4.40 

32.51 

7 

5.43 

40.67 

8 

3.68 

34.16 

8 

4.53 

40.67 

9 

2.94 

29.16 

9 

3.84 

36.04 

10 

2.13 

23.39 

10 

2.81 

28.60 

Table  I.  Statistical  significance  test  for  algorithm  comparisons 
in  the  Example  (100  Monte  Carlo  runs) 


Average  Cost  over  100  runs 


It 

0-0 

0- 

L 

0-2 

HCE  ( 

-4U 

E  C<» 
i-1  Ci 

r 

j  c(2) 
i-1  S. 

K” 

Z  C(3) 
i-1  i 

z  c(4) 

i-1  Ll 

1 

1.72 

— 

1.84 

— 

2.07 

.  .  — 

8.98 

1 

2 

1.59 

3.31 

1.63 

3.47 

1.65 

3.72 

4.61 

13.56  I 

3 

1.07 

4.38 

1.02 

4.49 

1.04 

4.76 

0.62 

14.18  j 

4 

0.87 

5.25 

0.  70 

5.  19 

0.59 

5.35 

0.23 

14.41 

5 

0.75 

6.00 

0.57 

5.76 

0.41 

5.76 

0.  13 

14.54  I 

6 

0.66 

6.66 

0.44 

6.20 

0.35 

6.11 

0.13 

14.67 

7 

0.51 

7.17 

0.35 

6.55 

0.30 

6.41 

0.12 

14.79 

8 

0.46 

7.63 

0.30 

6.85 

0.27 

6.68 

0.  12 

14.91 

9 

0.42 

8.05 

0.29 

7.14 

0.27 

6.95 

0.13 

15.04 

10 

0.38 

8.43 

0.29 

7.43 

0.27 

7.22 

0.13 

15.  17 

Sum  ” 

8.43 

7.43 

7.22 

15.  17 

Table  II. 


Average  costs  for  the  four  algorithms  in  the  Example. 
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7.  Conclusion 

A  suitable  expansion  of  the  cost  to  go  in  the  stochastic  dynamic 
programing  equation  can  yield  a  linear  controller  that  accounts  for  the 
controller's  dual  effect. 

The  simulation  runs  indicate  that  a  dual  controller  under  certain 
situations  shows  up  to  491  improvement  over  the  HCE  and  cautious  control¬ 
lers.  Statistical  analysis  of  Monte  Carlo  runs  indicates  that  on  the 
average  use  of  the  dual  controller  provides  approximately  a  201  improvement 
in  the  performance  criteria  over  the  cautious  controller. 

For  the  HCE  controller  the  learning  of  the  parameters  is  faster  than 
the  dual  or  cautious  but  the  vibration  cost  is  more.  As  a  trade-off  be¬ 
tween  faster  convergence  and  lesser  cost,  the  dual  controller  seems  to  be 
the  best. 

Appendix  A 

Statistical  Significance  in  the  Comparison  of  Controller  Performance 

Two  control  algorithms  are  compared  by  performing  a  Monte  Carlo  simu¬ 
lation.  S  Independent  runs  with  the  two  algorithms ,  under  the  same  homo¬ 
geneous  conditions,  yield  a  set  of  1.1. d.  samples  Cjj|l  cjj^  1*1*2 . S 

from  two  distributions  with  true  but  unknown  means 
J^and  respectively,  for  each  time  step  k. 

The  sample  means 


are  point  estimates  of  the  respective  true  means. 
A'statement  that 


indicating  that  algorithm  1  is  better  than  2  for  time  step  k  has  to  be 
accompanied  by  a  level  of  significance  a  of  type  I  error. 

Thus  we  test  the  hypothesis 


*  ■  4”  -  4"  i  • 


(algorithm  1  not  better) 


(A.  )) 


against  the  one  sided  alternative 


Hl:  A 


,<2>  -  j<»  >  0 

k  k 


(A. 4) 


(A.  51 


(algorithm  1  better) 

for  a  particular  a  level  at  each  time  step  k. 

This  probability  of  error  a  is  defined  as 

a  ^  P{accept  H^/Hq  true) 

Since  we  get  a  set  of  data  of  the  performances  of  the  two  algorithms 
on  the  plant  under  similar  conditions  we  regard  It  as  a  set  of  naturally 
paired  observations. 

We  consider  the  sample  differences 


Jik 


r(2)  _(1) 

Hk  "  Llk 


(A.f>) 


and  this  set  of  differences  A^  represents  a  sample  with  mean 


\  *  42)  -  (A-7) 

Thus  we  have  reduced  the  two-sample  problem  to  a  one-sample  problem. 
The  hypothesis  is  tested  by  examining  whether  can  be  accepted  as  being 
positive  with  high  confidence.  The  test  statistic  Is 


>>4 A. •>**/?.  ■  V 
.  •  Jto*t'*V  _. 
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CS-l)  S  (6i 


(A. 10) 


The  teat  statistic  Zfe  has  at-  distribution  with  (S-l)  degrees  of 
freedom.  For  S  large  (>50)  Z  has  a  normal  distribution.  Then  we  have 


i  ■  h  £  <v-v! 


r:;' 


and  the  hypothesis  is  accepted  if 


where  c  is  taken  from  the  normal  distribution  tables.  For  a  1  sided-test 
with  a  -  0.05,  c  -  1.645. 

The  estimated  improvement  for  each  time  step  k  Is  defined  as 


?<2)  ?<1) 

£  *  S. 
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An  adaptive  dual  control  algorithm  is  presented  for 
multiple- input,  multiple  output  (MI MO)  linear 
systems  with  input  and  output  noise  and  unknown 
parameters.  The  system  parameters  are  assumed  to 
belong  to  a  finite  set  on  which  a  prior  probability 
distribution  is  available.  The  difficulties  in 
characterizing  the  future  evolution  if  MIMO  system 
information  as  required  by  the  dynamic  programming 
are  overcome  through  a  novel  way  of  using  preposte¬ 
rior  analysis.  This  provides  a  probabilistic  char¬ 
acterization  of  the  future  adaptation  process  and 
allows  the  controller  to  take  advantage  of  the  dual 
effect. 


1.  Introduct ion  ( 

In  the  control  of  linear  stochastic  systems 
with  known  dynamics  and  quadratic  cost  the  Certain¬ 
ty  Equivalence  (CE)  property  (Al,  Bl]  is  known  to 
hold.  When  the  dynamics  are  incompletely  known, 
however,  due  to  parameter  and  noise  covariance  un¬ 
certainty  In  the  system  to  be  controlled,  then  the 
CE  property  does  not  hold  and  the  dynamic  program¬ 
ming  cannot  be  solved  (Al],  As  shown  in  [B2 ]  the 
optimum  control  has  the  dual  effect:  it  affects 

not  just  the  future  state  of  the  system,  but  also 
the  future  state,  parameter,  and  noise  covariance 
uncertainty. 

To  circumvent  this  inability  to  compute  the 
optimum  solution,  a  number  of  adaptive  suboptimum 
control  strategies  have  been  developed  [SI, 

Dl,  A2,  Tl,  W1J.  Except  for  [Tl,  Wl),  however, 
most  of  these  strategies  are  only  passively  adapt¬ 
ive  [Bl];  they  do  not  use  the  knowledge  that  future 
learning  will  occur.  An  algorithm  using  such  know¬ 
ledge  to  improve  its  control  decisions  is  called 
actively  adaptive;  the  dual  effect  of  the  control 
is  used  to  enhance  the  estimation  and  identification 
and  ultimately  the  performance. 

This  paper  presents  an  actively  adaptive  con¬ 
trol  algorithm  for  multiple-input,  multiple-output 
(MIMO)  linear  stochastic  systems  where  there  is  un¬ 
certainty  in  the  measurements  made  on  the  system, 
and  where  the  vector  0  of  constant  but  unknown  sys¬ 
tem  parameters  and  noise  covariances  is  equal  to 
one  of  M  known  model  vectors  0,,  j*l,...,M.  The 
problem  of  control  of  multipie^model  dynamic  sys¬ 
tems  considered  here  is  a  significant  generalization 
of  the  well  known  "two-armed  bandit  problem". 

The  aspects  which  make  the 
problem  considered  here  quite  general  are  the  in- 
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elusion  of  dynamics  with  discrete  uncertainties  as 
well  as  continuous  input  and  output  noises.  The 
algorithm  extends  the  method  presented  in  [Wl], 
which  was  developed  only  for  single-input,  single¬ 
output  stochastic  systems  where  there  was  parameter 
uncertainty  but  only  white  input  noise  with  known 
covariance.  The  algorithm  for  MIMO  systems  in 
general  state-space  form  presented  here  is  a  more 
sophisticated  suboptimal  solution  to  the  dynamic 
programming  equation  for  the  multiple  model  problem 
than  the  state-of-the-art  algorithms  [Dl,  SI, 

A2  ] . 

The  algorithm  presented  in  this  paper,  called 
the  MIMO  Model  Adaptive  Dual  (MAD)  control  algo¬ 
rithm,  overcomes  the  special  difficulties  posed  by 
the  MIMO  system  in  characterizing  the  future  evolu¬ 
tion  of  information  through  a  novel  use  of  preposte¬ 
rior  analysis.  Approximate  prior  probability 

densities  are  obtained  and  used  to  characterize 
future  learning.  The  result  is  an  approximate 
solution  to  the  stochastic  dynamic  programming,  the 
exact  solution  to  which  would  give  the  globally 
optimum  (dual)  control. 

2.  Problem  Formulation 

Consider  controlling  a  MIMO  linear  stochastic 
system  whose  dynamics  and  measurements  depend  on  an 
unknown  vector  0.  The  system  state  propagates  in 
discrete  time  as: 

x(k+l)  -  A(0)x(k)  +  B(0)u(k)  +  D(0)w(k)  (2.1) 

where  x(k)  is  the  state  n-vector,  u(k)  is  the  con¬ 
trol  r-vector,  and  w(k)  is  a  disturbance  d-vector 
assumed  zero  mean,  white,  and  Gaussian  with  vari¬ 
ance  W(0).  Imperfect  system  measurements  are  made 

y (k)  -  H(0)  x(k)  +  v (k)  (2.2) 

where  y(k)  is  the  measurement  q-vector  and  v(k)  re¬ 
presents  the  measurement  uncertainty,  also  taken  as 
zero  mean,  white,  and  Gaussian  with  variance  V(0). 
w(k)  and  v(k)  are  assumed  uncorrelated.  The  system 
matrices  A(0) ,  B(0) ,  D(0) ,  H(0),  and  noise  covari¬ 
ances  W(0) ,  V(0)  are  known  functions  of  the  con¬ 
stant  but  unknown  vector  0,  which  is  assumed  equal 
to  one  of  M  known  constant  model  vectors  04 ,  j*l, 
...,M,  with  corresponding  known  a  priori 
bi lities : 

PtO-Oj  ]  -  Aj (0) ;  j-l,...,M 

LV°>  ’ 1 

j-i 


proba- 

(2.3) 

(2.4) 
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The  objective  is  to  obtain  a  control  sequence 
{u(0),...  ,u(N-l)}  minimizing 

J(0)  -  E[C(0)  )  (2.5) 

where  the  cost  Is  quadratic  about  a  given,  time- 
varying  reference  trajectory: 

C(k)  -  [x(N)  -  xr(N)]'  Q(N)[x(N)  -  xr(N)] 

1  M-1 

+  I?k  {IX<1>  "  Q(i)tx(i)  '  xr(1)1 

+  [u(i)  -  ur(i)]’  R(i)  [u(i)  -  ur(i))}  (2.6) 

subject  to  equations  (2.1)-(2.4).  The  information 
vector  at  time  k,  Z(k) ,  consists  of  the  measure¬ 
ments  and  controls  up  to  k: 

Z(k)  -  {y(0),y(l),...,y(k),u(0),u(l) . u(k-l) } 


the  solution  of  the  stochastic  dynamic  progressing 
equation  (2.8)  for  M  models  can  be  reduced  to  com¬ 
puting  M(M-l) /2  two— model  costs  by  use  of  a  result 
which  may  be  found  in  IW1).  Only  the  two-model 
cost  approximation  will  be  developed  here  using 
models  6^  62.  The  prior  probabilities  at  k  In  the 

two-model  problem  are 

p[6-el|z(k),  u(k))  =  n(k),  p[e-e2|z(k),u(k)] 

-  i-n(k)  0.1) 

For  computational  feasibility  the  cost  is  ap¬ 
proximated  as  follows:  the  future  controls 
(1  >  k+1)  are  assumed  to  be  of  the  DUL  type  structure 
with  time-varying  probabilities  as  more  Information 
becomes  available  to  the  controller.  Thus 

E[J*(k+l)|Z(k),  u(k)  ^ 

E (min  E[C(k+l)|z(k+l),L(k+l))|Z(k),u(k)}  (3.2) 

L(k+1) 


(2.7) 

The  optimum  control  u*(k) ,  a  function  of  Z(k)  and 
the  statistical  description  of  the  future  measure¬ 
ments  [Bl],  is  obtained  by  solution  of  the  stocas— 
tic  dynamic  programming: 

J*(k)  -  min  E{i[x(k)  -  x  (k)] '  Q(k)  (x(k)  -  x  (k>  J 
u(k)  2  r  r 

+  |lu(k)  -  ur(k)]'  R(k)  [u(k)  -  ur(k)] 

+  J*(k+l)|z(k),  u(k)}.  (2.8) 

The  exact  solution  of  (2.8)  is  impossible  due  to 
the  "curse-of-dimensionality"  ;  the  parameter 
and  noise  covariance  uncertainty  prevent  the  exact 
computability  of  E  [J*(k+1)  |  Z(k) ,  u(k)].  The  state- 
of-the-art  in  suboptlmum  algorithms  which  circum¬ 
vent  this  difficulty  has  largely  consisted  of  the 
Heuristic  Certainty  Equivalence  (HCE)  algorithm 
[Bl],  where 


where  L(k+1)  is  the  set  of  parameters  in  the  con¬ 
troller  structure  from  k+1  through  the  end.  Using 
the  total  probability  theorem  the  (approximation  of 
the)  optimum  cost-to-go  may  be  written  as 

J*(k+1)  =  min  {II(k+l)E[C(k+l)  |Z(k+l)  ,L(k+l)  ,0=6.  ) 
L(k+1)  1 

+  [l-n(k+l) ]E [C(k+1) | Z (k+1) ,L(K+1) , 


e=e2]} 

where  by  Bayes'  rule 


(3.3) 


S 


n(k+i)  -  p[e-e1|z(k+i) ] 
i  ,  i-n(k)  plyOrt-DlzOO.uW.e-ej) 

n(k)  p[y(k+i)|z(k),u(k),e-01] 


with  the  appropriate  Gaussian  densities  in  (3.4) 
being 


P[y(k+1)  |  Z (k)  ,  u(k)  ,  0-0J  ]  - 


-  M 

e(k)  -  £  a  (k)0  (2.9) 

j-i  J  3 

is  assumed  the  true  parameter  vector,  and  the 
Deshpande-Upadhyay-Lalnlotls  (DUL)  algorithm  [Dl], 
where  the  model-optimal  controls  u, (k)  are  computed 
and  the  actual  control  taken  as  3 

M 

u(k)  -  22  A  (k)  u  (k)  (2.10) 

j-1  3  3 


Nly(k+1);  y^k+Dlk),  Sj(k+l)|k)]  (3.5) 

where  the  means  and  variances  in  (3.5)  are  obtained 
from  two  Kalman  filters,  matched  to  0-0,,  j-1, 2, 
respectively.  3 

Next  note  that  (3.2)  requires  performing  a 
multiple  integration  over  the  elements  of  y(k+l). 
This  is  not  computationally  feasible,  in  general, 
and  will  be  avoided  through  the  following  procedure. 
From  (3.4)  and  (3.5)  it  can  be  seen  that  y(k+l)  and 
n(k+l)  are  related  through  a  mapping  described  by 


The  active  Model  Adaptive  Dual  control  algorithm 
(MAD)  developed  in  [Wl]  for  systems  in  input-output 
form  was  able  to  achieve  significant  perform¬ 
ance  superiority  over  the  passively  adaptive  (non¬ 
dual)  HCE  and  DUL  algorithms  by  directly  obtaining 
an  accurate  approximation  of  E[J*(k+l) j Z(k) ,  u(k) ] . 

3-  Approximate  Solution  of  the  Stochastic  Dynamic 
Programing  Equation  by  Pairwise  Preposterior 
Model  Discrimination 

The  computation  of  E[J*(k+l) |z(k) ,  u(k) ]  in 


■j(y~y1) ’Sj1(y-y1)-  -i(y-y2) '  S21(y-y2)  - 


In 


i  q/2 


iq/2 


n(k)[i-nck+i)] 

[  i-n(k)  ]n(k+i) ) 


(3.6) 


where  the  time  arguments  of  y(k+l),  y(k+l)|k)  and 
S j (k+1 ) | k)  have  been  dropped.  Since  3 a  given 
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II  (k+1)  may  result  from  an  infinite  number  of  y(k+l), 
it  is  clear  that  n(k+l)  is  not  a  sufficient  statis¬ 
tic  for  y(k+l).  However,  fl(k+l)  can  be  used  to 
serve  as  an  "approximate  sufficient  statistic". 

Thus  (3.3)  may  be  rewritten  as 

J*(k+1)  -  min  {H(k+l)E[C(k+l)  |z(k)  ,u(k)  , 

L(k+1) 

n(k+l),  L(k+1),  0=0  ] 

+  [l-n(k+l) ]E[C(k+l) |Z(k) ,u(k)  ,II(k+l) , 

L(k+1),  0=02J}  (3.7) 

The  outer  expectation  of  (3.2)  over  y(k+l)  is 
then  replaced  by  an  expectation  with  respect  to 
p[Jt(k+l)  |Z(k) ,  u(k) J ,  the  preposterior  probability 
density  [Rl]  of  II  (k+1)  ,  the  "model  information 
state"  at  k+1.  An  approximate  preposterior  density 
with  two  delta  functions  at  locations  11^  (k+1)  and 

n2(k+l)  is  used  as  in  [B3,W1]. 

Having  established  an  implementable  preposte¬ 
rior  density,  the  next  step  is  to  construct  the 
minimization  in  (3.7)  with  respect  to  the  time- 
varying  future  controller  parameter  set  L(k+1) ,  a 
set  depending  of  course  on  II(k+l).  An  easily  im¬ 
plemented  approximate  solution  to  this  minimization 
is  obtained  by  assuming  3  future  sequence  of  DUL 
controls  represented  by  L(k+1) : 

E[J*(k+l)|Z(k),  u(k)  ]  *  J(k+1) 

V 

-  r  {n(k+i)E[c(k+i)|z(k),u(k),n(k+i),L(k+i),0=01) 

+  (l-n(k+l)  ]E(C(k+l)  |Z(k)  ,u(k)  ,n(k+l)  ,L(k+l)  ,0=02J ) 

•  P[n(k+i)  |z(k) ,  u(k)j  dn(k+i)  (3.8) 


from  a  recursion  for  the  linear  system  with  8=0  , 
quadratic  cost,  using  a  DUL  control  policy  withJ 
control  parameters  L/j (k+1) .  Details  of  the 
nominal  posterior  probability  generation  and  the 
recursions  for  are  contained  in  [W2  ] . 


Numerical  Studies 


A  second  order  system  is  considered  with  the 
following  two-model  system  description. 


A(01)  =  A(02) 


-0.49 

1.4 


B(0X)  -  10.45  2]' 


B(02) 

0(0^ 


H(01)  =  H(02) 


[0.9  1]' 

diag(l.l) 
diag(l.l) 


D(82) 


w(0x) 

v(01) 


W(02) 

v(02) 


diag(10_4,2.25) 

diag(10,10~2) 


A  priori,  P(0=0j)»P(0“0_)  =  0.5.  The  control 
objective  is  to  take  the  initial  state  of  x(0)  = 

[0  0.1]'  and  make  it  follow  over  N  -  5  time  stages 

the  state  reference  trajectory 


xr(l) 


0 

0.5 


.  *r(2)  = 


*r(3) 


xrC4) 


r  ° 

f°- 

i 

.  x  (5)  - 

L0.1 

r  1  10 

Using  the  two  delta  function  preposterior  density 
above  and  performing  the  integration  gives  the 
approximate  cost-to-go  resulting  from  a  particular 
control  decision  u(k) : 

J(k +1)  =  n(k)ni(k+l)J11[k+l),u(k),L11(k+l),9=ei] 
+  n(k)[i-n1(k+i)]J12[k+i,  u(k),l12(k+i)  ,e=02] 

+  u-n(k)  ]n2(k+i)J21[k+i,u(k)  ,L21(k+i)  ,0=91] 


with  quadratic  weighting  matrices 

Q(0)  -  0  and 

Q (1)  -  Q(2)  -  Q(3)  =  diag(O.l) 

Q(4)  =  diag(0,5)  ,  Q(5)  =  diag(0,50) 

There  was  no  penalty  associated  with  the  control, 
R(k)  =  0  V  k. 


The  first  test  was  to  compute  the  sample  means 
and  sample  standard  deviations  of  the  cost  samples 


OPT 


-HCE 
»  » 


„DUL 


and  C 


MAD 
i  ' 


The  results  are  con- 


+  (i-n(k)][i-n2(k+i)]j22[k+i,u(k),L22(k+i),e=92]  (3.9) 

The  nominal  sequence  of  control  parameters  (k+1) , 

L,  j»l,2  comes  from  a  time-varying  DUL  weighted 
sum  of  model-optimal  controls.  This  sum  is  com¬ 
puted  with  nominal  weighting  factors  given  by: 

(i)  fl (k+1)  =  n^(k+l)  as  the  sufficient  statistic 

for  0  at  k+1, 

(ii)  subsequent  nominal  posterior  probabilities 
n£j(l)  that  8=0^  which  evolve  as  i=k+2,..., 

N-l  when  this  DUL  control  is  applied  to  the 
system  with  0=0^ . 

The  single-model  optimal  control  parameters 
are  obtained  from  a  standard  linear  quadratic  pro¬ 
blem  with  8  known.  The  costs  j£j  are  obtained 


talned  in  Table  1. 


Algorithm 

OPT 

HCE 

DUL 

MAD 

Sample  mean 

60.97 

269.3 

223.4 

110.4 

Sample  standard 

73.9 

443.6 

406.4 

137.8 

deviation 

Table  1.  Sample  Average  Costs  and  Standard 
Deviations 


This  table  gives  the  first  indication  of  the 
superiority  of  MAD  over  HCE  and  DUL  in  both  mean 
cost  reduction  and  performance  cost  variability. 
Note  that  MAD  has  reduced  the  mean  cost  by  51Z  over 
DUL,  and  by  59Z  over  HCE.  MAD  has  reduced  the  cost 
variability  by  66Z  over  DUL  and  by  69X  over  HCE. 

Are  these  results  truly  statistically  slgnifi- 
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cant?  Are  the  true  means  ordered  as  the  sample 
means  would  Indicate?  To  answer  these  questions, 
a  rigorous  statistical  test  for  the  comparison  of 
controller  performances  was  developed  in  [Wl].  The 
sample  means  of  the  differences  and  the  standard 
deviations  of  the  sample  means  are  given  for  the 


algorithms 

In  Table 

2.  They 

indicate  that 
Significance 

Estimated 

[Al] 

Algorithms 

A 

test_statlstic 

Improvement 

[A2] 

Compared 

aA 

A/cs 

Z 

HCE  -  DUL 

45.866 

13.767 

3.3316 

17 

HCE  -  MAD 

158.86 

29.881 

5.3164 

59 

[Bl] 

DUL  -  MAD 

112.99 

27.033 

4. 1797 

51 

Table  2.  Statistical  test  results  for  algorithm 
comparisons 

the  hypotheses  that  MAD  is  better  than  both  HCE  and 
DUL  are  accepted. 

Table  3  illustrates  the  manner  in  which  the 
need  for  active  learning  is  sensed  by  MAD.  For 
various  possible  values  of  the  control  decision  at 
period  1,  MAD  evaluates  the  future  learning  oppor¬ 
tunities  and  calculates  the  future  costs.  For 
u(l)  »  A. 35  the  preposterior  density  characterized 
by  1^(2)  and  n,(2)  indicates  that  not  enough  learn¬ 
ing  wj^ll  take  place  to  minimize  the  effect  of  the 
term  J  (2)  in  the  cost  to  go  equation  (3.9)* 
(J2^(2J  represents  the  cost  of  a  mismatched  control¬ 
ler  which  does  not  learn  fast  enough  what  the  true 
system  is).  For  larger  u(l)  the  learning  is  faster, 
but  after  u(l)  «  5.09  the  price  of  learning  exceeds 
the  benefit. 

Table  3  also  gives  Insight  into  how  to  deter¬ 
mine  a  priori  (or  even  on  line) ,  in  a  non-Monte 
Carlo  fashion,  when  it  is  valuable  (and  necessary) 
to  use  an  active,  dual  control  decision  making  al¬ 
gorithm  like  MAD:  when  the  penalty  for  a  mismatched 
controller  is  large  and  its  probabilistic  contribu¬ 
tion  to  the  cost  is  significant. 


A 

A 

u(l) 

J(2) 

ni(2) 

n2(2) 

Jn(2) 

64.93 

Jl2<2)  J21<2) 

J22(Z) 

4.  35 

126.5 

.9416 

.05747 

68.49 

2153. 

66.97 

4.55 

104.0 

.9488 

.05039 

65.68 

68,04 

1557. 

66.45 

4.75 

76.00 

.9554 

.04389 

66.59 

67.80 

504.5 

65.98 

MAD  5.09 

67.  76 

.9653 

.03415 

68.56 

67.66 

116.2 

65.25 

5.  35 

68.19 

.9713 

.02824 

70.29 

67.75 

114.4 

64.79 

5.55 

68.78 

.9754 

.02421 

71.85 

67.93 

121.5 

64.48 

5.75 

69.47 

.9790 

.02067 

73.56 

68.20 

129.8 

64.20 

In  these  cases  active  adaptation  can  be  expected  to 
improve  the  transient  behavior  in  adaptive  control 
by  speeding  up  the  adaptation  process. 
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[83] 

[Dl] 

[SI] 

[Tl] 

[Wl] 


Table  3.  Cost  Breakdown  and  Learning  for  MAD 


5.  Concluding  Remarks 

An  actively  adaptive  control  algorithm  has  been 
derived  for  multiple  Input,  multiple  output  stochas¬ 
tic  systems  In  general  state  space  form  possessing 
both  continuous  and  discrete  modes  of  system  uncer¬ 
tainty.  The  algorithm,  called  Model  Adaptive  Dual 
Control,  is  the  only  actively  adaptive  controller 
for  this  class  of  systems.  Rigorous  statistical 
tests  were  used  to  show  statistically  significant 
performance  improvement  in  the  new  actively  adapt¬ 
ive  MIM0  MAD  algorithm  over  two  state-of-the-art 
passively  adaptive  control  algorithms.  It  has  been 
shown  in  particular  that  when  there  is  heavy  termin¬ 
al  state  penalty  and  the  control  period  is  relative¬ 
ly  short,  passive  learning  often  does  not  suffice. 
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Vbslracl  \n  adapmc  dual-control  guidance  algorithm  is 
presenlod  lor  inlercepling  a  mining  target  in  the  presence  of  an 
intertering  target  Idecoyl  in  a  stochastic  emironment  Two  sc'- 
Utiences  of  measurements  are  obtained  at  discrete  points  in  time, 
however,  it  is  not  certain  which  sequence  came  from  the  target  of 
interest  and  which  front  the  decoy  Associated  with  each  track, 
the  interceptor  also  receives  noisy,  state-dependent  feature- 
measurements  The  optimum  control  for  the  interceptor  which  is 
given  by  the  solution  of  the  stochastic  dynamic  programming 
equation  is  not  numerically  feasible  to  obtain  An  approumatc 
solution  of  this  equation  is  obtained  by  evaluating  the  value  of  the 
future  information  gathering  this  is  done  through  the  use  of 
prepostenor  analysis  approximate  prior  probability  densities 
are  obtained  and  used  to  describe  the  future  learning  and  control 
In  this  way.  the  interceptor  control  is  used  for  information 
gathering  in  order  to  reduce  the  future  target  and  decoy  inertial 
measurement  errors  and  enhance  the  observable  target  decoy 
feature  dillerenees  lor  subsequent  disci immalion  between  the 
true  target  and  the  decoy  Simulation  studies  have  shown  the 
dlcct i v cucss  ot  the  scheme 


I  INTRODl  (MON 

A  Nf\v  (  ontrol-pi  vision  strategy  for  intercept¬ 
ing  a  moving  target  is  developed  where  the  target  is 
using  a  defensive  dean  in  an  environment  best 
described  by  a  stochastic  process.  The  decision¬ 
making  problem  takes  place  during  the  terminal 
phase  of  interceptor  guidance. 

At  discrete  points  in  time  the  interceptor  receives 
noisy.  state-dependent,  feature  measurements:  one 
from  the  true  target  and  one  from  the  decoy.  It  is 
assumed  that  there  is  no  measurement  to  track 
association  uncertainty :  however,  it  is  not  certain 
which  measurement  sequence  came  from  the  target 
and  which  from  the  decoy.  Additional  sources  of 
uncertainty  arc  the  imperfect,  noise-corrupted  state 

•Received  2  Vf.tv  I9xj.  levied  *  J.iniurv  l*JX4  Research 
supported  hv  W  OSR  (irani  mukw.v  l  lie  original  version  of  this 
pa pei  vy.is  not  presented  at  anv  If  \(  meeitng  T!.*s  paper  was 
tecommcndcd  for  public. itiou  in  revised  form  hv  Associate 
I  tlnor  H  Sorenson  under  the  ilireciton  of  I  dilor  H  I)  O 
\  nderson 

+  Department  off  lectrical  f  nginecnng  .iml  Computer  Science 
I  I  mversitv  ol  Connecticut  Stuns.  Cl  t W»2f»S.  ISA 


observations  and  the  inherently  unknown  time-to- 
intercept.  The  result  is  a  highly  nonlinear  stochastic 
control  and  decision-making  problem,  with  both 
continuous  (all  noises)  and  discrete  (track  identity) 
sources  of  uncertainty,  in  which  the  control  has  a 
dual  effect  (Feldbaum.  1965):  in  addition  to  its  effect 
on  the  relative  interceptor/target  decoy  states 
themselves,  the  present  interceptor  control  also 
affects  the  future  feature  observation  process  and 
hence  the  target  decoy  identification  uncertainty. 
Specifically,  the  interceptor  control  must  be  used  for 
information  gathering  about  the  true  target  track 
by:  (a)  reducing  future  target  and  decoy  inertial 
measurement  errors  by  changing  its  own  slate  and 
hence  the  relative  stales,  and  by  (b)  enhancing 
observable  target  decoy  feature  differences  for  sub¬ 
sequent  discrimination  between  the  true  target  and 
the  decoy.  All  of  these  information  theoretic 
characteristics  are  functions  of  the  inter¬ 
ceptor-target  decoy  states,  which  are  in  turn 
directly  affected  by  the  interceptor  control.  The 
decisions  must  also  simultaneously  be  used  to 
optimize  the  function  of  interceptor  guidance 
toward  the  target  (control  proper,  which  is 
inseparable  from  the  information  gathering).  The 
problem  is  further  complicated  by  certain  con¬ 
straints:  maximum  fuel  capability,  and  possibly, 
maximum  lime-to-intercept  and  interceptor  state 
constraints. 

This  is  an  example  of  a  nonlinear  stochastic 
control  problem  in  which  the  optimum  solution 
exhibits  an  inseparability  between  the  dual  actions 
of  the  control  decision  in  gathering  information 
about  the  partially  unknown  system  (reducing 
uncertainty),  and  simultaneously  changing  the 
system  state  itself  (the  control  function  proper, 
which  requires  minimum  uncertainty  or  maximum 
information  about  the  state).  In  general,  systems 
with  both  continuous  and  discrete  nonlinear 
probabilistic  structures  create  decision-making 


Dual  control  guidance  for  simultaneous  identification  and  interception 


739 


£ 


ft 


Gaussian,  zero  mean  (WGZM)  with  known  co- 
variance  Q,. 

Let  the  motion  of  the  interceptor  be  given  by 


\i( A:  4-  1 1  —  -l,x,|Al  +  Bulk)  4~  G|W|(k) 

k  =  0. 1.2. 


(2) 


where  u(A)  is  the  interceptor  control  vector  to  be 
determined  at  time  k  and  w ,( A )  is  the  process  noise. 
WGZM  with  known  covariance  Q 
The  measurement  equations  of  the  two  vehicles 
are 


z,(A)  =  W,x,(A)  4-  v,(A)  /  =  1.2  (3) 

A  =  1.2 _ 

where  v,(A)  is  the  measurement  noise,  WGZM  with 
known  covariance  Rh 

The  measurement  equation  associated  with  the 
interceptor  is 


where  0  =  j.  j  =1,2  represents  the  event  that  thejth 
track  is  the  track  originated  from  the  target  of 
interest. 

The  interceptor's  objective  is  to  choose  the 
control  strategy  u(A)  that  minimizes  the  expected 
terminal  weighted  relative  position  of  the 
target/interceptor  (T  /  )  at  the  unknown  (random) 
terminal  time  N.  subject  to  the  dynamic  control 
effort  bound  and  speed  limit  of  the  interceptor.  For 
the  problem  to  be  meaningful,  it  is  assumed  that  the 
interceptor  is  capable  of  intercepting  any  of  the 
vehicles  in  finite  time.  This  leads  to  the  stochastic 
control  cost  criterion  to  be  minimized  at  time  A 


J{k)  =  £[C(A)] 


=  £ 


X  u'(i')R(/)u(i)  +  g'[xfl(N),x,(N)] 


Qg [x«( ‘V ).  X|(N ) ] |  Z\  pk.  Uk 


■] 


(10) 


% 


z,(A)  =  H,X|(A)  +  v,<  A )  A  =  1.2 _  (4) 

where  v ,  I A ).  the  measurement  noise  of  the  inter¬ 
ceptor.  is  assumed  to  be  WGZM  with  known 
covariance  R,. 

To  discriminate  between  the  two  vehicles,  a 
feature  measurement  //,(A)  associated  with  each 
vehicle  I  is  obtained.  For  simplicity,  this  feature 
measurement  is  assumed  to  be  a  scalar  and  is  a 
function  of  the  state  of  the  vehicle  x,( A  |.  the  state  of 
the  interceptor  X|( A )  and  the  true  feature  </>,.  that  is 

/if(A)  =  fW>|.x,(A).x,(A)]  +  x,(A>  /  =  1.2.  (5) 

A  =  1.2 _ 

Here,  it  is  assumed  that  #  r/>,  for  the 
identification  purpose  and  x,(  A )  is  the  additive  white 
noise,  independent  of  the  states,  assumed  normal 
with  mean  zero  and  variance  it,-. 

All  the  noise  sequences  are  assumed  to  be 
mutually  independent. 

In  this  formulation  of  the  target  decoy  inter¬ 
ception  problem,  it  is  assumed  that  the  vehicles 
follow  the  state  equations  ( I )  without  changing  their 
state  models.  The  extension  of  this  formulation  to 
the  case  of  the  target  decoy  changing  its  state  model 
is  discussed  in  the  example  section.  Also,  this  mode! 
can  easily  be  extended  to  the  case  of  state-dependent 
feature  measurement  noise. 

The  following  notations  are  used: 

Z‘  =  |z,(i).z,(i):  /  =  1.2:  i  =  1.2 . AJ  (6) 

/?*  =  !/?,(/):/=  1.2:  /=  1.2 . k\  (7) 

i'k  =  | u(r ):  i  =  0. 1 . A|  (8) 

rr(A)  =  Pj(>=  l|Z\0\  L ■*  (9) 


subject  to 

K(i)|  <  Oil  Vn.  Vi  >  A  (11) 

and 

T,(f+  l)<tr  Vi'  >  A  (12) 

where  R(»)  is  a  known  (time-varying)  control 
weighting  matrix ;  g  [x„(  .V ).  x,( N )  ]  is  a  vector-valued 
function,  whose  components  are  the  positions 
differences  between  the  states  x„(/V)  and  x,(N);  Q  is 
a  known  constant  weighting  matrix  associated  with 
this  relative  terminal  T  I  position  state;  u„(i)  is  the 
nth  component  of  the  control  vector  u (/);  u™ax(i)  is  a 
known,  time-varying  dynamic  control  effort  bound, 
which  depends  on  the  kinematic  acceleration 
capability  of  the  interceptor;  r,(i),  a  function  ofx,(i) 
is  the  interceptor's  speed  at  time  i  and  r|"JX  is  a  known 
speed  limit  of  the  interceptor. 

Let  an  admissible  control  decision  vector  u( A )  be  a 
function  of  Z*  and  / ik  as  well  as  the  statistical 
description  of  the  future  observations  (Bar-Shalom 
and  Tse,  1976).  Then  the  optimum  control  strategy 
for  this  nonlinear  stochastic  control  problem  is 
obtained  by  applying  the  Bellman's  Principle  of 
Optimality,  which  leads  to  the  stochastic  dynamic 
programming  (SDP)  equation.  Solution  of  the  SDP 
equation  yields  the  globally  optimal  control,  which, 
in  general,  has  the  dual  effect  (Bar-Shalom  and  Tse. 
1974;  Feldbaum.  1965).  At  time  A,  the  SDP  is 
described  for  this  problem  as 

d*(A)  =  min  Eju’(A)R(A)u(A) 

Uffcl 

+  ./*(A  +  I  )|Z\0\  L*]  (13) 
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subject  to 

k.(A)|  <  iC'lAl  V/,  (14) 

and 

T,(fc+  l)<rrx  (15) 

where  for  a  given  u(A).  J*(k  +  1 )  is  the  optimum 
cost-to-go  from  time  k  +  1  to  the  unknown  terminal 
time  S  and  the  expectation  is  done  with  respect  to 
all  future  random  variables,  including  both  inertial 

observation  errors  and  the  feature  parameter 

observation  errors. 

The  exact  solution  to  this  problem  is  impossible 
due  to  the  fact  that  no  distribution  over  N  is 
available  and  because  of  the  ‘curse  of  dimensionality' 
(Bellman.  1961).  This  can  be  avoided  only  by  a 
recursion  for  the  cost-to-go  which  here  docs  not  exist 
because  of  the  track  uncertainty.  We  present  next  an 
approximate  solution  of  this  problem. 

J  APPROXIMATE  SOLUTION  OK  THE  STOCHASTIC 
DYNAMIC  PROGRAMMING  EOL  ATION 
For  computational  feasibility,  the  cost  is  approxi¬ 
mated  as  follows:  the  future  control  (i  >  k  +  1 )  are 
assumed  to  be  of  the  DUL  type  (the  ‘partitioned' 
control  obtained  by  Deshpande,  Upadhyay  and 
Lainiotis.  1973)  as 

u(0  =  7t(/)U|(i)  +  [I  -  7r(/|]u2(/|  (16) 

where  u,|i)  is  the  bounded  optimum  control  at  time 
i.  given  0  -  j  with  time-varying  probabilities  as 
more  information  becomes  available  to  the  con¬ 
troller.  and  where  the  controls  u ,(/)  and  u</)  satisfy 
the  constraints  (11)  and  (12)  as  shown  in  the  next 
section.  With  this  the  optimal  cost-to-go  in  (13)  is 
replaced  by 

£[./*|A  +  1  5 

E\  min  ET|A  +  I  )|Z‘*  './)*'  ’.  U\£|A  +  l)]| 

(i.u  •  1 1 

Z\/l\  F*|  (17) 

where  Uk  -  1 )  is  the  set  of  parameters  in  the 
controller  structure  from  k  +  I  through  the  end  and 
('Ik  +  I )  is  the  cost  function.  Using  the  total 
probability  theorem,  the  (approximation  of  the) 
optimum  cost-to-go  may  be  written  as 

J*\k  +  1)^  min  ;tt(k  +  I  |E[C(A  +  I  ||Z‘”. 

in  *  1 1 

/?**'.  U\£(A  +  ll.6=  I]  +  [I  -  7t(A  +  l|] 
£[C|A  +  I  )|Zl  ‘ 1  ./<*■* 1 .  U‘.E(A  +  11.6  =  2]; 

(18) 


where  by  Bayes"  rule 


n{k  +  I )  =  P\0  =  l|Z‘*'. /?**’.  (’*; 


-HwT) 

p[z(A+  l).p(A  +  l)jZ\/)\  Uk.O  =  2; 

>T‘ 

p[z(A  -1-  1  ).p(A  +  1  >|  zk.pk,  ukj)  =  1 ; 

ij 

(19) 

where 

z(A  +  1 

)  =  [zj(A  +  1 ).  z2(A  +  1  >,  z,(A  +  1 )]' 

(20) 

and 

P(A  +  1)=  [pi(k  +  1)./J2(k  +  1)]'. 

(21) 

Here  z(A  +  I )  and  p(A  +  I )  are  respectively  the 
(column)  vectors  of  all  state  measurements  and 
feature  measurements  at  time  A  +  1. 

Assuming  that  the  conditional  joint  density  of 
z(A  +  1).  P(A  -f  I)  in  (19)  is  known  or  can  be 
obtained,  the  computation  of  ( 1 7)  requires  perform¬ 
ing  a  multiple  integration  over  their  elements  (20) 
and  (21 ).  This  is  not  computationally  feasible  and  is 
avoided  as  follows:  since  the  mapping  from  z(k  +  1 ) 
and  p(A  4-  1 )  to  n(k  +  i)  is  not  one-to-one  (in  fact, 
many-to-one ),  7t(  A  +  1 )  is  not  a  sufficient  statistic  for 
z(k  +  I )  and  P(A  +  I ).  However.  n(k  +  1 )  can  be 
used  to  serve  as  an  ‘approximate  sufficient  statistic’. 
Using  this  approximate  statistic  in  (18)  and  then 
replacing  the  outer  expectation  of  ( 1 7 )  over  z(k  +  1 ) 
andP(A  +  1 )  by  an  expectation  over  it(A  +  1)  results 
in 

E[J*(k  +  l>| Zk.fi\  f*]  s 

f  min  J  ji(A  +  I  )£[C(A  +  1  )|Z\j?\  l/\ 

Jo  lAk  *  1 1 

iz[k  +  1 ).  L{k  +  1 ),  0  —  I  ] 

+  [I  -  n(k  +  1  )]E[C(A  +  1)| Zk.pk,  UK 

n(k  +  1 ).  E(A  +  1).()  =  2]j 

■p[rt(A  +  l)|Z*.^‘.  L  k]dn(k  +1)  (22) 

where  p[n(A  +  I  )| Zk.pk.Uk]  is  the  preposterior 
probability  density  of  7t(A  +  I  MRaiffaandSchlaifer. 
1972).  The  use  of  the  exact  density  in  (22)  would 
require  numerical  integration  and  this  is  avoided 
using  a  two-point  delta  function  density  as  in  Wenk 
and  Bar-Shalom  (1980)  and  Wenk  (1981). 

As  the  vehicles'  discrimination  capability  in¬ 
creases.  the  preposterior  density  exhibits  a  bimodal 
character,  largely  concentrated  around  two  distinct 
locations,  say  7t,(k  +  l)  and  n2(A  +  l).  The 
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approximate  preposterior  density  then  can  be  taken 
as 

p[n(k  +  I  )\Zk.lik.  Uk]  5  Jr(A0[«(A  +  1 1 
-  nx{k  +  1 1]  +  [I  -  Jr|A(] 

<>[rr(A  -i-  I )  —  n:\k  +  1 1]  (23) 

where  the  delta  function  locations  7r,|A  +  1)  and 
7t,(A  +  1 1  satisfy 

0  <  7r2(A  +  I )  <  n(k)  <  n,{k  +  1)  <  1.  (24) 

The  locations  7t,(A  +  1)  and  n2(k  +  I)  are 
obtained  by  matching  the  first  two  moments 
produced  by  the  approximate  density  (23)  to  the 
true  preposterior  moments  of  n(k  +  1 ).  The  explicit 
expressions  for  nx  and  n2  are  derived  in  the 
Appendix  A  (Wenk.  1981 ).  Substituting  this  simple 
prepost'Tior  density  (23)  in  (22)  and  assuming 
that  the  minimization  in  (22)  occurs  when 
L{k  +  1 )  =  £(A  +  1 )  representing  the  future  con¬ 
trols  to  be  of  the  constrained  DUL  type,  gives 
approximately  the  expected  cost-to-go  resulting 
from  a  particular  control  decision  u(A)  (Wenk  and 
Bar-Shalom.  1980) 

E[J*{k  +  1)| Zk.lik.  L  k]  s  n{k)nx[k  +  1) Jxx{k  +  I) 
+  ji(A )  [ I  —  rt | (A  +  I )] '  J x 2(  A  +  I )  +  [I  —  rr(A (] 

■  tt2( k  +  I ) J 2 1 (A  -c  I )  +  [I  —  rr(A ) ]  [ I  —  rr2(  A  +  I )] 
J22\k  +  1)  (25) 

where 

Jmjik  +  I )  =  £[C|A  +  I  >| Z‘./f\  l k. LmJ{k  +  1 )] 

^  min  £[0(A  +  l)|Z‘./i*.  i  k. 

I.lk  *  1 1 

njk  +  I  ).L(A  +  I ).()  =  /'].  (26) 

The  nominal  sequence  of  control  parameters 
L„„|A  +  I );  m.j  =1.2  are  given  by 

(i)  7r( A  +  1 1  =  njk  +  I )  as  the  sufficient  statistic 

for  0  at  A  +  I.  and 

lii)  subsequent  nominal  posterior  probabilities 
«„,,(/)  for  /  >  A  +  2.  representing  the  prob¬ 
ability  at  time  i  of  the  first  track  being  from 
the  target  when  jt(A  +  I )  =  nm{k  +  1 )  and 

0  =  /■ 

Details  of  the  nominal  posterior  probability 
generation  and  computation  of  Jmj  are  contained  in 
the  next  section. 

t  (it  M  RATION  Ot  Tin  NOMINAL  PARAMI  TKRS 
AND  Tilt  < OSl-KMiO 

The  intercept  time  S  is  not  necessarily  the  same 
for  the  target  and  for  the  decoy.  For  both  the  tracks. 
,V  is  a  complicated  function  of  the  states  of  the 


vehicle  Xj(A)  and  the  interceptor  x,(A).  the  future 
controls  to  be  applied  and  the  process  noise,  yet  to 
come.  To  obtain  a  solution  of  this  nonlinear 
stochastic  control  problem,  N  is  taken  to  be  the 
same  for  both  tracks  and  is  estimated  as  the 
minimum  number  of  sampling  intervals  including  A 
in  which  the  interceptor  will  intercept  either  of  the 
two  vehicles  maintaining  its  control  effort  bound 
and  its  speed  limit.  Clearly.  N  is  reestimated  at  each 
time  A  and  the  corresponding  estimate  is  N(k). 

The  nominal  sequences  of  future  posterior 
probabilities  nmj(i):  m.j  =  1,2  are  generated  by 
constructing  a  future  observation  and  control 
sequence,  based  on  the  statistical  information 
contained  in  the  approximate  preposterior  density 
(23),  which  in  turn  is  a  function  of  the  control  u(A). 
At  time  A,  the  nominal  values  for  time  A  +  1  and  for 
the  path  mj  are  obtained  as  follows 

NmJ(k  +  l)  =  N(A)  (27) 

nmj(k  +  1 )  =  njk  +  I )  (28) 

x,„y|A  +  l)  =  .4,x,(A|A)  /  =  1.2  (29) 

xlmj(A  +  I )  =  Ax\x(k\k)  +  Bu(A).  (30) 

The  nominal  optimal  control  for  the  interceptor 
u,m;(A  +  I ).  where  0  =  j.  n{k  +  1 )  =  nm(A  +  I )  and 
the  interceptor  considers  the  /th  track  as  the  track 
from  the  target,  is  given  by  the  solution  of  the  LQG 
problem  (for  the  estimated  terminal  time 
Nmj[k  4-  1 )).  In  case  this  optimal  control  exceeds  the 
bound  (II).  the  appropriate  bound  is  used.  Then  the 
nominal  DUL  control  for  the  interceptor  at  time 
A  +  I  is  given  by 

u„,((A  +  I)  =  nmj(k  +  l)u,m/(A  +  1) 

+  [1  —  WmJ(A  +  1  )]  'U2my(A  +  1).  (31) 

If  the  resulting  nominal  speed  of  the  interceptor  of 
time  A  +  2  exceeds  the  limit  (12),  then  the 
magnitude  of  this  nominal  control  u^lA  +  1 )  is 
reduced  by  considering  the  control  t  •  QmJ(A  +  1), 
0  <  t  <  1  (i.e.  the  direction  of  the  desired  nominal 
control  is  unchanged )  so  that  the  interceptor  moves 
at  its  speed  limit. 

Observe  that  the  nominal  feature  measurements 
at  time  A  +  1  are  not  generated  since  the 
information  of  these  features  is  contained  in 
tt,(A  +  1 )  and  tt2(A  +  1 ). 

For  time  i  >  k_  +  2.  the  quantities  X(m;(t),  X,mJ(i>, 
ILjU)-  Nmj(i)  and  Bmj(i)  are  obtained 

recursively  as  follows: 

W*  =  -  II  /  =  1,2  (32) 

xlmy(i)  =  4lxl„„(<  -  I )  +  BD„j(i  -  I )  (33) 

/W'l  =  f  [</»;>•  S/*y( 'I.  Vfl'U  1=1.2  (34) 


Dual  control  guidance  for  simultaneous  identification  and  interception 


743 


y 


Fig.  I.  Projection  lb)  of  the  length  feature  <J>,  of  the  /th  vehicle  (a) 
on  the  interceptor’s  ll)  linc-of-sight. 


Pi(k)  =  </>,|sin  [y,(A I  -  \l/,(k)]\  4-  x,(A I  /  =  1.2 

k  =  1,2. 


(46) 


where 


■;,(A )  =  tan  ~ 1  — 7/ J  ~  '  'v  * 

Vi(A)  —  \|(A ) 


n  n 

-  ,  <  7 /(*)  < 


(47) 


and 


* 


t/r,(A)  =  tan 


,  Xi(k) 


V|(A) 


7 T  _ 

-^<t^,(Ai<-.  (48) 


Here,  though  the  noise  a, (A)  (assumed  Gaussian) 
has  the  real  line  as  its  support,  this  is  an 
approximation  for  this  example  since  /?,( A )  is  always 
non-negative.  This  approximation  js  acceptable  if 

$1  >:> 

Now,  the  estimation  of  the  posterior  probability 
nik  +  1 ).  when  z(A  +  I )  and  p(A  4-  I )  are  available 
(i.e.  to  update  the  system),  the  computation  of  the 


nominal  posterior  probability  7t„;(i)  (35)  the 


determination  of  the  one-step  predicted  value  of 
P(A  +  I ).  i.e.  Py  (A.4)  and  the  associated  covariance 
S,  (A. 7)  will  complete  the  discussion  for  this 
example.  Since  the  feature  measurements  (46)  are  a 
highly  nonlinear  function  of  the  states,  the 
conditional  joint  density  of  z(A  4-  l).p(A  4-  1 )  in  ( 19) 
is  not  directly  available.  Observe  that  at  time  A.  we 
do  not  compute  7t(A+  I)  using  (19).  rather  we 
compute  71, (A  +  I )  and  n2(k  4-  1 )  using  (A. 10)  and 
(A.  1 1 ).  Hence,  to  obtain  n(k  4-  I  >  at  time  A  +  I. 
when  the  measurements  z(A  4  i  and  P(A  4-  I )  are 
available,  we  rewrite  (19)  as 


7t(A  4-  )  = 


I  4- 


1  -  71(A) 


7T(A) 

p[p(A  4-  1  )\Zk*',flk.  l'\ 0  =  2] 
/>:ptA  4-  l|| Z*+,,/f\  l\(>  =  I] 


(49) 


Here,  if  the  coniui  r.al  density  ofP(A  4-  1  )in  (49)  is 
determined  mai  by  the  noise  characteristics  of 
a(A  4-  1 )  (otherwise,  a  better  approximation  of  this 
conditional  density  has  to  be  obtained  and  this  is 
omitted  in  this  example  so  as  not  to  deviate  from  the 
main  theme  of  this  paper),  then  this  density  is 
approximately  g.  .n  by 


p [P(A  4-  ll| Z**'./?\  U\0=j ]  ^ 

^£[p(A  +  l)|Z‘+,.0\  U\0=j],  ^ 

7=1,2  (50) 


where 


E[p,(k  4-  I  )|Zk+ './?*.  U\0=j]  ^ 

d>,y|sin  [7, (A  4-  1 1 A  4-  1)  —  i^,(A  4-  1|A  4-  1)]| 

I  —  1.2.  (51) 


Here,  <f>,j  is  as  in  (34)  and  7,  (A  4-  1  j  A  4-  l),$,(A  4-  1|A 
4-  I )  are  approximated  as 


7, (A  4-  1 1 A  -+-  1 )  = 

7  - 1  .'/(^  *(■  1 1  ^  +  I )  —  t’i(A  4-  1  j  A  4-  1 ) 


tan "  *  --- — -I-, - — — ~ - rn - r1  /  =  1 , 2 

v,(A  4-  1 1 A  4-  1 )  —  v,(A  4-  1|A  4-  1) 


and 


with 


x,(A  4-  1 1 A  +  I)  =  £[x,(A  4-  l)|Z‘  +  \  (/*]/=  1,2 

(54) 


x,(A  4-  1 1 A  4-  l)  =  £[.*,(*  4-  l)|Z*+1,  l/*] 


(55) 


being  obtained  using  Kalman  filters. 

Similarly,  rewriting  (35)  to  obtain  the  nominal 
posterior  probability  nmj(i)  gives 


**.;('>  = 


1  4- 


I  -  nmAi  -  I ) 


*-«(«  -  1 ) 


fptMH  = 


{p[fmiW\zimj,pim-i\uim-\o 


=3111-. 
=  i]jJ 


(56) 


As  in  (49).  we  assume  that  the  conditional 
distribution  of  fl^(r)  in  (56)  is  approximately 
gaussian.  Then,  simplifying  the  expression  in  {  •}  of 
(56)  gives  the  equation  for  7tmj(i)  as 


n„y(i)  ^ 


l-7rm,(i-l)  U-\y 

1  4 -  , - exp< — - — 

*.,,■(»  -  1 1  2 


•lft*>(0-fr,:<i)]'  ff°,  [&,,(»')- R»2(<)]J  ’. 


(57) 
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Now  $j,  defined  in  ( A.4)  is  obtained  similar  to  (5 1 1 
as 

E[f},ik  +  1  )|Z\/J\  ilk,0  =j]  s 

{ sin  [y ,(k  +  +  l|A)]| 

I  =  1.2  (58) 


y,(k  +  1 1 A )  -  (59) 

\i(k  +  \\k)  —  X\{k  4-  l| k) 


and 

+  .601 

with 

x,(k  +  1 1 A )  =  £ [x,( A  +  l|Z\/?\  C/*]  (61) 

and 

x,(A  +  1 1 A)  =  £[x,(k  +  l)|Zk./(\  L’k]  (62) 

being  obtained  using  Kalman  filters. 

Finally.  Sj.  the  covariance  matrix  associated  with 
(58)  may  be  taken  as 


A  final  remark  on  the  extension  of  the  present 
work:  In  the  analysis  presented  of  the  target  decoy 
interception  problem,  it  was  assumed  that  the  target 
and  the  decoy  will  follow  the  same  state  models 
throughout,  i.e.  the  models  do  not  switch  to  other 
state  models.  After  this  algorithm  of  the  target/ 
decoy  interception  problem  has  been  activated,  if 
any  of  the  vehicles  do  switch  to  a  different  state 
model,  that  switch  must  be  detected  and  the 
corresponding  filter  should  be  reinitialized.  Notice 
that  the  analysis  presented  in  this  work  remains 
valid  for  the  switched  model  as  long  as  the  states 
propagate  according  to  an  equation  similar  to  (1 ). 
for  example  a  switch  from  a  nearly  constant  speed 
(non-maneuvering)  model  of  (41 )  to  a  nearly 
constant  acceleration  (maneuvering (model  with  the 
state  vector 

X  =  [x.  x,  y,  y,  x,  y]'.  (64) 

A  simple  maneuver  detection  scheme  for  tracking 
a  maneuvering  target,  i.e.  a  scheme  to  detect  the 
switching  of  models,  is  given  by  Bar-Shalom  and 
Birmiwal  (1982).  It  was  observed  there  that  suitable 
state  models  at  all  times  will  result  in  the  best 


tracking  performance.  Using  such  a  scheme  to 
detect  the  switching  of  models  and  then  reinitializ¬ 
ing  the  switched-state  model,  the  present  work  is 
easily  extended  to  the  case  of  the  target/decoy 
changing  models. 


6.  SIMULATION  RESULTS 
As  an  evaluation  of  this  algorithm,  the  above 
example  was  simulated.  Two  sets  of  feature  lengths 
were  chosen:  one  for  the  target  and  decoy  being 
nearly  ‘identical’  and  the  other  corresponding  to 
more  separated  features.  For  each  set  of  features, 
two  pairs  of  distinct  trajectories  for  the  two  vehicles 
were  considered.  Initial  values  of  these  trajectories 
were 


The  sampling  time  interval  T  was  taken  to  be 
3  sec.  The  process  noise  covariance  matrix  asso¬ 
ciated  with  the  interceptor  was  taken  to  be  zero 
while  for  the  two  vehicles,  it  was 


=  £>2  =  0°j  J(m/s)2.  (67) 

The  interceptor's  state  measurement  noise  co- 
variance,  R|  was  taken  to  be  zero  (the  interceptor 
knows  its  state  with  relatively  more  certainty  and  no 
Kalman  filter  for  the  interceptor)  while  for  the  two 
vehicles,  it  was 


The  feature  measurement  noise  variance  was  taken 
to  be  a i  =  a\  =  4mJ.  Since  no  information  about 
target/decoy  was  available  at  time  0,  rr(0)  =  0.5. 

The  cost  matrix  R  (/)  associated  with  the 
interceptor  control  was  taken  to  be  the  same  for  all  i 


Vi. 


(69) 
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The  cost  matrix  Q  associated  with  the  relative 
position  of  the  terminal  target  decoy  and  inter¬ 
ceptor  state  was  taken  to  be 


The  interceptor  control  bound  1  was  taken 
to  be  25  m  s2  for  ;  =  1.2  and  for  all  i.  The  discrete 
controls  u [k)  were  chosen  over  a  grid  of  points 
(controls)  5  m  s2  apart  in  both  the  directions  and 
whose  effective  direction  of  acceleration  was  within 
90°  of  the  direction  of  motion  of  the  target, decoy. 
The  speed  limit  of  the  interceptor,  rj”**  was  taken  to 
be  250  m  s.  The  threshold  n*.  which  is  used  to  decide 
about  the  identities  of  the  tracks,  was  taken  to  be 
0.499.  i.e..  the  decision  about  the  tracks  was  made 
when  n(k)  was  greater  than  0.999  or  smaller  than 
0.001.  After  the  decision  about  ‘which  track  is  from 
the  target'  is  made,  the  bounded  optimal  control 
obtained  from  the  solution  of  the  LQG  problem  for 
the  estimated  time-to-go  was  applied  to  the 
interceptor  until  the  determined  target  was 
intercepted. 

Tracks  of  both  vehicles  were  initialized  using  the 
two-point  differencing  of  the  measurements  me¬ 
thod,  as  in  Bar-Shalom  and  Birmiwal  ( 1982).  Initial 
values  of  the  interceptor  state  components  were 
taken  to  be  zero. 

The  two  sets  of  feature  values  considered  were 
<t),  =  22  m.  <t> 2  =  20  m  and  =  28  m,  <j>2  ~  20  m. 
Observe  that  the  features  of  the  first  set  differ 
effectively  by  less  than  one  standard  deviation  of  the 
feature  measurement  noise. 

For  each  of  the  four  cases,  a  Monte  Carlo 
simulation  of  ten  runs  was  performed.  It  was 
observed  that  the  interceptor  intercepted  the  true 
target  correctly  in  all  the  runs.  Figure  2  shows  the 
typical  motions  of  the  target,  decoy  and  the 
interceptor,  starting  at  time  zero  until  the  inter¬ 
ception  took  place,  for  the  very  close  features  set 
and  the  trajectories  one.  Figures  3  5  show  these 
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*30; 


-e.ocl _ _ _ _  _ _ _ „ _ 

-  00  4  X  e  00  ,  K  ?-  . 

m  povt  on  (km! 

Fig  2.  Typical  motions  of  the  target,  dean  and  the  interceptor 
for  the  case  of  0  =  I.  0,  =  22m.  0;  =  20m  and  trajectories  I 
Here,  the  distance  between  two  consecutive  similar  symbols  is 
fi\e  sampling  intervals  1 1 5 sec t  and  •  represents  the  locations  of 
the  two  vehicles  and  the  interceptor  when  the  decision  about  the 
tracks  is  made.  Legend .  *  »  target ;  +  +  decoy : 

<£>  interceptor 


Fig.  3.  Typical  motions  for  the  case  of  0=1,  0,  =  28m. 
<t>:  =  20  m  and  trajectories  1. 


Fig.  4.  Typical  motions  for  the  case  of  0=  I.  <j>,  =  22m. 
0,  =  20m  and  trajectories  2. 


Fig.  5.  Typical  motions  for  the  case  of  0=  I.  0,  =  28m. 
0.  =  20m  and  trajectories  2. 


motions  for  the  other  three  cases.  For  the  same  set  of 
random  numbers  and  corresponding  to  each  of* the 
above  40  runs,  another  set  of  runs  was  performed 
with  target  and  decoy  tracks  interchanged.  Again, 
the  true  target  was  identified  correctly  and 
intercepted  in  all  these  runs.  Figures  6-9  are  the 
plots  corresponding  to  Figs  2-5  respectively  with  0 
changed  (target  and  decoy  switched). 

From  these  figures,  we  observe  that  the 
interceptor  takes  longer  time  in  deciding  about  the 
tracks  when  the  interceptor  is  on  the  endfire  than  on 
the  broadside.  This  is  because  the  feature  measure¬ 
ment  noise  is  more  dominating  in  the  former  case. 
When  the  target  and  decoy  are  more  different,  it 
takes  less  time  to  decide  about  the  tracks,  which  is 
intuitively  obvious.  When  the  target  and  decoy  are 
nearly  identicial,  the  interceptor  does  not  follow 
them  directly.  Instead,  it  takes  a  course  so  that  at  the 
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time  of  interception,  the  last  nominal  a  posteriori 
probability  —  I )  is  close  to  its  extreme  value 
(here  we  have  the  dual  effect ).  To  achieve  this  goal  in 
minimum  time,  the  interceptor  tries  to  be  on  the 
broadside  of  the  target,  decoy.  In  case  the  target  and 
decoy  are  easily  discriminable.  the  interceptor 
follows  the  vehicles  directly  because  it  anticipates 
that  the  future  learning  will  guide  it  correctly  to  the 
true  target. 


e -o  .2  6  ...  zo  oo  24  oo 

*  position  (km) 


Fio.  6  Typical  motions  for  the  case  of  0  =  2.  </>,  =  20  m. 
0,  =  22  m  and  trajectories  1. 


m  position  (km) 

Fio.  T  Typical  motions  for  the  case  of  o  =  2.  0,  =  20m. 
<t>2  =  2Xm  and  trajectories  1 


o  -  oo  s  oc  2.00 

jr  position  (km) 


Flo.  X.  Typical  motions  for  the  case  of  «  -  2.  <!>,  =  20m. 
</>;  =  22  m  and  trajectories  2. 


The  relative  importance  of  the  terminal  state  cost 
over  the  interceptor  control  cost  was  seen  by 
changing  all  diagonal  components  ofQ  to  0.0S  m  ~ 2 . 
For  this  Q  and  the  rest  of  the  parameters  unchanged, 
a  M  on  te  Carlo  of  ten  runs  was  obtained  for  each  set  of 
the  trajectories  and  features  corresponding  to  Figs  2. 
4.  6  and  8.  It  was  observed  that  the  true  target  was 
intercepted  correctly  in  all  these  runs.  Figures  10-13 
are  the  respective  plots  giving  the  typical  motion  of 


m  position  (km) 

Fig.  10.  Typical  motions  for  the  case  of  (1=1,  0,  =  22  m, 
<t> 2  =  20  m,  trajectories  I  and  reduced  Q. 


m  position  (km) 


Fig.  II.  Typical  motions  for  the  case  of  0=1.  <t> ,  =22m. 
</>,  =  20  m.  trajectories  2  and  reduced  Q. 


m  position  (km) 

Fig.  12.  Typical  motions  for  the  case  of  II  =  2,  <pt  -  20m. 
<t>:  =  22  m.  trajectories  I  and  reduced  Q. 
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the  target,  decoy  and  the  interceptor.  By  comparing 
these  two  sets  of  figures,  it  is  clear  that  the  interceptor 
tried  to  be  more  on  the  broadside  in  the  former  case 
(and  hence  took  less  time  to  decide  about  the  track 
ID)  so  that  the  last  nominal  posterior  probability 
S(.Y  —  I)  was  closer  to  its  extreme  value.  This  is 
because  the  terminal  state  cost  was  relatively  more 
dominating  over  the  interceptor  control  cost  in  the 
former  case  than  in  the  latter  case.  i.e.  the  controller 
w  as  more  w  illing  to  expend  the  additional  fuel  to  take 
the  more  energetic  trajectory  in  the  former  case  and 
this  resulted  in  faster  convergence. 

The  algorithm  was  run  in  Fortran  IV  on  IBM- 
3081  D.  The  number  of  statements  in  the  code  were 
around  1300.  but  the  code  included  overhead 
(trajectory,  noise  generation)  and  hence  was  not 
efficient.  The  memory  requirement  was  approxi¬ 
mately  250K  and  the  average  CPU  time  for  each 
run  was  approximately  I  min. 


7.  CONCLUSIONS 

An  adaptive  dual-control  guidance  algorithm  for 
intercepting  a  moving  target  has  been  developed  for 
the  situation  where  the  target  is  using  a  defensive 
decoy  in  a  stochastic  environment.  At  each  time 
step,  the  interceptor  chooses  its  bounded  contriol 
and  hence  its  trajectory  such  that  it  can  differentiate 
between  the  true  target  of  interest  and  the  decoy 
w  ith  the  aid  of  the  expected  future  state  observations 
and  the  feature  measurements,  approaching  at  the 
same  time  towards  the  target  decoy.  To  reduce  the 
computational  load,  an  approximate  solution  of  the 
stochastic  dynamic  programming  equation  is 
obtained  by  performing  the  preposterior  analysis. 
The  algorithm  developed  is  especially  useful  if  the 
cost  associated  with  the  terminal  miss  distance 
between  the  true  target  and  the  interceptor  is 
relatively  high  compared  to  the  interceptor  control 
cost.  The  case  of  the  target  decoy  changing  their 
state  models  is  also  considered.  The  simulation 
studies  have  shown  the  effectiveness  of  the  scheme. 


Rl£l FRINGES 

Har-Shulom.  V  anil  K.  Birmm.il  |I9N2|.  Variable  dimension 
tiller  loi  maneuvering  laigcl  tracking.  II.. ET.  Trims. 
\cru\pinc  lira/  IK't  I  mm  i  Swicins,  18.  621 
Bui -Shalom.  X  and  F  Tse  1 1 *>74 1.  Dual  effect,  certainty 
equivalence,  and  separation  in  stochastic  control  ll.i.E 
I rtlli .  t uto  (.'mil  .  AC-19.  494 

Bar-Shalom.  Y.  and  E.  Tse  |I976|.  Concepts  and  methods  in 
stochastic  control.  In  C.  T  Leondcs  i  Ed  l.  (  mnrol mu!  Dynmnu 
Si  stems  Alliances  in  Theory  mnl  Applications.  Vol.  12. 
Academic  Press.  New  York 

Bellman.  R.  (1961 1.  Adaptiie  Control  Pr mess  I  (mulcil  Tour. 

Princeton  University  Press.  Princeton.  New  Jersey 
Brvson.  A  P.  and  Y.C.  Ho  ( 1969 1  Applied  Optimal  Control.  t~h.  5 
Ginn  and  Company.  Waltham.  Massachusetts 
Casler.  Jr.  R  J  (I97X).  Dual  control  guidance  for  homing 
interceptors  w  ith  angle-only  measurements  1/1-1 ./  Guidance 
mnl  Control.  I,  61 


Chang.  C.  B.  11979).  Ballistic  trajectory  estimation  with  angle- 
only  measurement.  Proc.  IXt/i  IEEE  Con/,  on  Decision  and 
Control.  2.  6.12.  Florida. 

Deshpande.  J.  G..  T.  N.  Upadhyay  and  D.  G.  Lainiotis  (1973). 
Adaptive  control  of  linear  stochastic  systems.  Automalica.  9. 
107. 

Dowdle.  J.  R..  M.  Athans.  S.  W.  Gully  and  A.  S.  Willsky  (1982) 
An  optimal  control  and  estimation  algorithm  for  missile 
endgame  guidance.  Proc.  2 1  si  IEEE  Con/,  on  Decision  and 
(.  out rot.  3.  1128.  Florida. 

Fcldbaum.  A.  A.  (1965).  Optimal  Control  Systems.  Academic 
Press.  New  York. 

Lee.  G.  K.  F.  (1982).  Investigation  of  time-to-go  algorithms  for 
air-to-air  missiles.  Proc.  1st  American  Coin.  Con/..  3.  988. 
Virginia. 

Murtaugh.  S.  A.  and  H.  E.  Criel  (1966).  Fundamentals  of 
proportional  navigation.  IEEE  Spectrum.  December,  1975. 
Newell.  H.  E..  Jr.  (1945).  Guided  missile  kinematics.  Naval  Res. 

Lah .  Washington.  D  C..  Report  R-2538. 

Pastrick.  H.  L..  S.  M.  Seltzer  and  M.  E.  Warren  (1981). 
Guidance  laws  for  short-range  tactical  missiles.  AIAA  J. 
Guidance  and  Control.  4. 

Raiffa.  H  and  R.  Schlaifer  (19721.  Applied  Statistical  Decision 
Theory  M.l.T.  Press.  Cambridge.  Massachusetts. 

Riggs.  T.  (1979).  An  overview-optimal  control  and  estimation  for 
tactical  missiles.  Nat.  Aero,  and  Eleclr.  Con/.,  p.  752. 

Saridis.  G.  N.  (1977).  Sell-Organizing  Control  p/  Stochastic 
Systems.  Dckker.  New  York. 

Speyer.  J.  L.  and  D.  G.  Hull  (1982).  Estimation  enhancement  by 
trajectory  modulation  for  homing  missiles.  Proc.  1st  American 
Control  Coni..  3.  978.  Virginia. 

Tse.  E.  and  Y.  Bar-Shalom  (1973).  An  actively  adaptive  control 
for  discrete-time  systems  with  random  parameters.  IEEE 
Trans.  Auto.  Cont..  AC-18.  109. 

Tse.  E.  and  Y.  Bar-Shalom  (1975).  Adaptive  dual  control  for 
stochastic  nonlinear  systems  with  free  end-time.  IEEE  Trans. 
Auto.  Coni..  AC-20.  670. 

Wenk.  C.  J.  and  Y.  Bar-Shalom  (1980).  A  multiple  model 
adaptive  dual  control  algorithm  for  stochastic  systems  with 
unknown  parameters.  IEEE  Trans.  Auto.  Com..  AC-25.  703. 
Wenk.  C.  J.  (1981).  Decision  strategies  with  dual  effect  for 
inseparable  stochastic  control  problems  with  continuous  and 
discrete  uncertainty.  Ph  D.  Thesis.  University  of  Connecticut. 
Dept,  of  Electrical  Engineering  and  Computer  Science. 

APPENDIX  A:  DERIVATION  OF  THE  APPROXIMATE 
PREPOSTERIOR  DENSITY 

The  locations  re , (A  +  I)  and  n2(k  +  I )  of  the  preposterior 
density  (23)  are  obtained  by  matching  the  first  two  moments  of 
(23)  to  the  true  preposterior  moments  of  n(k  +  I k  viz. 
E[mk  +  1)| Z\0*.  C*]  and  E[tt2{k  +  I  )| Z\fP.  l/*].  From  the 
Fundamental  Theorem  on  Expectation  and  (19),  we  have 

E[mk  +  l)|Z\/M’‘]  =  ir(fc).  (A.I) 

Using  the  total  probability  theorem,  the  true  second  moment 
of  rrlA  -i-  1 1  can  be  rewritten  as 

n-’l k  -r  1 1 1?  E[tr(k  +  I  ||Z\/f\  C*]  =  tr(fc) 

E[n-ik  +  UjZ\/M  ‘.ff=  I  ]  +  [I  -  ttlkl] 

I.  n:tk  e  ll| Z\//*.  C‘.|)  =  2],  (A.2) 

Now  consider  E[nA k  +  I  l|Z‘./(‘.  I ‘.(I  =  j].  In  view  of  (19). 
and  ignoring  the  variation  of  it(k  +  I )  with  respect  to  Hk  +  1 ). 
(hen  nik  +  I )  is  an  explicit  function  of  (Ilk  +  I ) 

nik  -t-  1 1  =  triPlk  +  I )]  £  «[|],  (A.3) 

Expanding  it(k  -r  I )  to  second  order  about 

(1,  £  Eifrk  +  l)| Z‘./J*.  C*,0=>]  (A.4) 

JIIVOS 

(A.5) 
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where  are  ihe  function,  gradienl  and  Hessian  of  l  A.)  I 

respectively,  evaluated  at  fl,.  Then,  since  p  -  p,  is  approximately 
Gaussian,  equation  (A.5)  gives 

£[it2l k  -  ll| Z\/(\  I  =  /]  5  [it,]2  +  it,tr  -(V-'ir.iS,; 

+  lVir.1  S,|Vit;|  +  Ur  |  +  j,‘tr  [(V-je, l.V,  1 ; •'  (A.fti 

where  tr  denotes  the  trace  operator  and  X,  ~  Xlk  +  1 1 A.. fi  =  II  is 

the  residual  covariance  associated  with/f, 

SU  +  \]k.0  =  ii  =  co\  (pi k  +  1 1 :A/K‘.  l ‘.«  = /]•  <A.7i 

This  completes  the  computation  of  the  second  moment  (A.2t. 
Equating  lA.l  |  and  i  A.2|  to  the  respective  moments  of  (23i  gives 

irlkl  =  niA )  -  rrtlA  *  1 1  +  [  1  —  jr(A ) ]  -  it 2t A  +  1 1  (A.Si 

tt:lk  -  1 1  =  itiA )  Tift  A  +  1 1  -*•  [I  —  nfA  >(  ■  rr;iA  -  I  >.  (A.9I 

Ising  ( 24 1.  i.A.Xi  and  |  A  .9 1  yields  the  desired  two-point 
preposterior  density  locations 

( I  -  rrt  A  i  ,  .  | '  2 

rr , i A  —  1 1  =  rti A l  —  *.  •  [ir'lA  -i-  1 1  -  triA l‘  ■  t.A  1 1*» 

l  trlk  I  I 

rrt  A  > 

rr -i A  -  1 1  =  ■  I  -  r,lk  -  111  (A. 1 1 1 

I  —  rrlA  I 


APPENDIX  B:  THE  ALGORITHM 


Step  I 

Initialize  it(0l  and  S,|(ljOl.  x;(OfO|.  x,|0(0|  along  with  their 
respective  covariances.  Obtain  the  predicted  states  S,(l|0). 
x,|l;0l.  Define  k  =  0. 

Step  2 

Is  it  desired  to  obtain  the  optimal  control  for  target 
identification  and  interception  (true  only  when 
t7rt A  l  -  0.51  <  it*  I?  If  yes.  go  to  Step  3.  otherwise  terminate. 


Step  3 

Choose  a  feasible  control  u|A  I.  i.e.  a  control  that  satisfies  the 
constraints  |«„|A )'  <,  iC*'|kl  and  r,ik  +  I )  <  Obtain 

x,ik  +  Hi. 


x,i  A  *  1 1 A  t  =  -It&gtA  I A  l  -  Hut  A ).  (B.l 1 

Step  4 

C'ompulep,  =  £  pi  k  +  If /‘./I*.  (  \d  =  /]  and  the  associated 
covariance  Sr  For  the  example  considered 


0 ,  ,;sin  ;* , | k  +  I ]k  I  —  o i ( k  *  1  k ) 
(/>.,isin  ;'.(A  +  I j A »  —  v:tk  .  I  A i]| 


where*, Ik  +  I  ’  A  I.  u/,f  A  *  1 1 A  I:  /  =  1.2  are  given  by  the  equations 
1 59 1  and  l  Nil  respectively  along  with  equations  lb  I  land  (62 1  The 
expression  for  S,  is  given  by  (63). 

Step  5 

Compute  rr  1 1 A  -  1 1  and  n,|A  *  1 1. 

(I  —  ir(A|  ,  ,  I '  2 

rt,(k  *  1 1  =  it(A  i  *  <  -  Itrlk  •  1 1  -  jrlk)  ,,  iB  3 1 

f  m*i  T 


ir-lk  *  1 1  =  .  ,  1 1  -  rr ,  t A 

I  -  in* ) 


*-[■*' -T'H-3 

i-fe» 

::i  *  *"ii 

(B.5) 

< 

« 

*«. 

1 

n;  o]  1  . 

'  .  Ip.  -  fra) 

0  ff;  J 

|B.6| 

V-jt,  =  *,-(!  -  it,)  (I  2By| 

![» :;i ' 

Hi, -A r[”J  ”]"}■ 

IB.7) 

Step  6 

Compute  the  cost  Jik)  for  the  control  ul A I  as  follows: 

Jlk)  =  ulk>Ktkmlk)+  E[J*tk  +  l)|Z‘./S*.  17‘]  (B.8) 

where 

t  y*U  -  I  l‘j  =  iriA)it|(A  +  |) 

/,,(A  -  1 1  +  irlAl-  (I  -n,! A  +  ||j/l}(fc  +  1 1 

+  [I  -  irlAl]  ■  n s(A  +  I  )/.,(£  +  1 1 

-  (I  -  rr(A);  :l  ~  tt;lk  -  ll]  /,,(*  +  I).  (B.9) 

Now  we  compute  Jm,lk  +  I )  for  the  sequence  mj.  »i  =1.2; 

t  =  1.2. 

lil  Estimate  \ ill.  For  the  example  considered 
V(M  =  min  (.V ,  (A  >.  A?  2(A )) 

where 

:\\(M  .  k  +  +  1  (=1.2  (B.IO) 

where  ( v  ]  is  the  greatest  integer  less  than  or  equal  to  x.  T is  the 
sampling  interval  and  (,*  is  the  positive  root  of  the  equation. 

(irr*')2  -  ;ivUtA|A|):  +  t y,(A | All2] ]  rJ 

-  2  [  s'  i(A  |  A )  1\|<A  |  A )  -  v,(  A  |  A )  |  +  v"  ,(A|  A I  ■  (/-((A  |  A  f 


-  t:i<A | At) ]r  -  (< v,(A|A »  -  v,(A|A))3 
v  (i*,(A|A|  -  V|lA|A))2]  =  0.  (B.l 1 1 

In) Define 

N„,(A  +  1 1  =  N(A|  (B.l 2) 

*„,(*  +  11  =  njk  +  1 1  (B.l 3) 

*,«,(£  +  1|  =  A,S,|k|kl  1=1.2  (B 14) 

Sim,!*  +  H  =  .4|£,(A|A|  +  Bulk  |.  (B.I5) 


The  solution  of  the  LQG  problem  for  the  example  considered 
and  for  the  nominal  terminal  time  Smj[k  +  1 )  is 

0(m,l*  +  1 )  =  —  !  [B  Am, (A  +  2)6+  R( A  +1|]  1 
B'K.,1  k  +  2Mi 

'  [S,m,(*  +  1 1  -  S,„j(A  +  I )]  1=1.2  (B.I6) 

where  A'„ ,(*  +  2)  is  given  by  the  backward  equation  iRicatti 
equation) 

Am, (A  I  :k„,(A  +  2|  -  Km{k  +  2)B[B'Km,(*  +  2IB 

*  R(k  r  II]  'BKmtk  +  2)(4  (B.I7) 

A„l.sfm,l*  +  Hl  =  Q*  (B.I8|  . 


where  Z:(k  -  1 1  =  E*i:2<k  *  I  )|/\ t  ‘ ]  is  given  by  equation 
( A.2 1  which  in  turn  is  given  by  equation  (A.fti.  This  involves  ft,. 
Vif,  and  V2if,  which,  for  the  example  considered,  are  given  by 


and  Q"  is  the  4  x  4  matrix  whose  elements  are  all  zero  except  the 
dements  ( 1. 1 1. 1 1 . 3 ).  (3. 1  ( and  ( 3. 3 1  which  are  the  elements  ( I .  I  k 
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(I.  7).  (2.  1 1  and  (2.  2l  respectively  of  the  Q  matrix.  For  the 
general  ease,  the  optimal  control  0,.,|A  +  1 1  is  a  function  of  A,. 
I  =  1.2.  1,  and  the  different  state  components  of  the  three  state 
lectors.  The  solution  can  be  obtained  for  am  specific  problem  by 
proper  augmentation  of  the  stale  vector  and  the  transition 
matrices. 

If  the  magnitude  of  the  mh  component  of  this  control  exceeds 
the  bound  iiJ“NA  +  1 1.  then  change  this  component  to 
ilO*  +  1  >•  ie-  sign  (direction)  unchanged.  Obtain  the  DUL 
control 


umjik  ’■ii  =  nm,ik  ■*-  i  iu ,.,<a  -v  1 1  -  ,  i  -  v  i  ij 

u;.,tA  v  1 1.  t b.  i y i 


Truncate  this  control  to  i  u„,iA  *  I  ltd  •  i  •  1 1.  if  necessary,  so 
that  the  nominal  speed  of  the  interceptor  at  time  A  +  2  is  equal  to 


liiil  For  time  i  >  k  2.  define  recursively 


W  = 


Si»,U'l  =  -l|Si™,(i  -  1 1  1=1-2 

Vim, If  I  =  -f|Vta,l'  -  I  I  *  »U.(I I  ~  I 
PlmjH)  «  f  [<Plr  S|„,|i).  ^l*|(t  I  j  I 


1.2 


'  T  ’'"i"  _  1 1 
KM  -  I) 

r 


PPm&KiWlZ'm,  l«,  '.  II  =  2]1‘ 
rli-jiii-P-d'ilZ'././fL,'.  L  =  i  ] J 


(B.20) 
(B.2I ) 
|B  22f 


(B.2.1) 


Now.  estimate  \„,(i|  as  was  done  for  time  A.  If  ,\'„,ti|  =  i  +  I. 
then  go  to  |iv).  Otherwise  continue  Obtain  the  DUL  control 


0„,li>  =  *„,('! Iff  +  ;i  -  it„,UI]  (B.24) 

uhereO,„,|i)isobtainedusfort>mcA  +  1  andai„,(/)andO.,|i)are 
adjusted,  if  necessary,  as  was  done  for  time  k  +  I.  Go  back  to  (iiil. 
For  the  example  considered. 

fiimiti)  =  </> i,|sin  !Vta,fi>  -  v(.,(n]l  (B.25l 


where 


ita/l'f  -  I'ta/I'f 

tB  26) 

W'l  -  'i-rl'l 

iT,-,CI 

(B.27) 

V|»;('l 

and  irll„(i|  is  given  by  the  equation  (57). 

r  t 

(IV 1  J.,a  +  I )  =  Y.  “-<•'>  *<*»■»,<*) 

»  •  A  -  1 

+  !«»,('•  8  *  II. \|*,(I+  U]Qg[x,w/(f  +  I). 

k|„,d  *  111  ♦  II  -  it., (If]  R  [*:.,(/  +  H. 

*i.,< '  *  H)Q  *!*:„('  +  lf.h,.,(<  +  *  I  ].*•  (B  2M I 

For  (he  example  considered. 

I  1 

•/„,(*. +  !)==  Y  tt.,(M*(s*./lsl 

»  =  l  *  l 

+  *.,(»>•  [,'vi„fi  +  i)  -  >,.,(/  +■  u;: 

+  ;  >i-,(i  +  n  -  + 1 1; 2 ]’  •’ 

+  [I  -  h«,uiH!'>.,(i  +  n  -  v,.,ti  *  n;- 

+  !ii»,(t  +  If  -  tW'  +  I  > ;  ’ ] 1  2  (B  29) 


Step  7 

Find  the  cost  J{k\  for  different  discrete  controls  and  find  the 
control  U*|A|  that  minimizes  J[k).  Apply  ti*(A|  at  time  k. 


Siep  X 

Obtain  the  observations  t,(k  +  If.  /i,(k  +  I).  /=  1.2.  and 
*i(A  +  1 1.  Update  the  estimates  S,(A  +  l|A  +  I ).  x,(A  +  l|A  +  I ). 
x,(A  +  2|A  +  I)  using  Kalman  filters.  Update  the  posterior 
probability  it|A  +  I )  using  equation  (49)  and  for  the  example 
considered,  using  equation  (50).  Increment  A  by  one  and  go  to 
step  2. 
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ABSTRACT 

A  new  adaptive  dual  control  solution  is  presented 
for  the  control  of  a  class  of  multi-variable  input- 
output  systems.  Both  rapidly  varying  random  parameters 
and  constant  but  unknown  parameters  are  included.  The 
new  controller  modifies  the  cautious  control  design  by 
numerator  and  denominator  correction  terms.  This  con¬ 
troller  is  shown  to  depend  upon  sensitivity  functions 
of  the  expected  future  cost.  A  scalar  example  is  pre¬ 
sented  to  provide  insight  into  the  properties  of  the 
new  dual  controller.  Monte-Carlo  simulations  are  per¬ 
formed  which  show  improvement  over  the  cautious  con¬ 
troller  and  the  Linear  Feedback  Dual  Controller  of 
[1]  and  {21. 

1.  INTRODUCTION 

Multi-variable  systems  which  are  characterized  by 
uncertain  parameters  with  large  random  variations  are  a 
difficult  challenge  for  most  control  design  techniques. 
The  assumed  randomness  of  the  parameter  variations 
often  precludes  the  use  of  gain  scheduling  (non  adapt¬ 
ive)  control  design.  Stochastic  adaptive  control 
theory  provides  a  principal  design  approach  for  systems 
of  this  type.  Exact  solution  of  the  stochastic  prob¬ 
lem  with  unknown  parameters  requires  solution  of  the 
Stochastic  Dynamic  Programming  equation  and  this  is  not 
feasible  for  practical  implementation.  The  solution  is 
known  to  have  a  dual  effect  [1,2]  that  can  be  used  to 
enhance  the  real-time  identification  of  system  paramet¬ 
ers  as  well  as  provide  good  control. 

Many  suboptimal  dual  solutions  have  been  suggested 
(1,2, 5—11 ] .  The  various  approaches  which  have  incor¬ 
porated  this  dual  property  can  be  loosely  divided  into 
two  classes.  In  the  first  class  (3-8],  the  optimal 
control  problem  is  reformulated  to  consist  of  a  one- 
step  ahead  criterion  to  be  minimized,  augmented  by  a 
second  term  which  penalizes  the  cost  for  poor  identifi¬ 
cation.  This  approach  is  attractive  due  to  the  analy¬ 
tical  tractablllty  of  the  solution;  however,  the  solu¬ 
tion  is  based  on  a  one-step  criterion  and  does  not 
fully  exploit  the  dual  property  of  a  multi-step  solu¬ 
tion.  Padilla  and  Cruz  [14]  give  a  dual  control  solu¬ 
tion  for  such  a  plant  by  minimizing  the  control  object¬ 
ive  function  subject  to  an  upper  bound  in  the  total 
estimation  cost.  Their  objective  function  includes  a 
standard  control  objective  function  and  also  a  second 
constraint  term  which  reflects  the  sensitivity  of  the 
parameters  to  the  state  of  the  system.  Thus  the  solu¬ 
tion  adjusts  itself  to  exercise  better  estimation  for 
such  sensitive  parameters  within  the  upper  bound.  The 
second  class  [9-11]  utilizes  the  stochastic  dynamic 
programming  equation  directly  and  performs  lineariza¬ 
tion  of  the  future  cost  in  order  to  obtain  a  solution. 
Previous  control  solutions  among  this  second  class  re¬ 
quire  a  numerical  search  procedure  which  poses  diffi¬ 
culties  for  a  practical  solution  for  on-line  control 
for  multivariable  systems. 

The  linear  feedback  dual  controller  of  [1,2]  is 
♦Supported  by  NASA  Ames  Research  Center  Grant  NAG  2- 
213;  Y.  Bar-Shalom  was  also  supported  from  Air  Force 
Office  of  Scientific  Research  Grant  AFOSR  80-0098. 
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based  upon  a  first  order  Taylor  series  expansion  of  the 
expected  future  cost  and  la  called  the  first  order  dual 
(FOD).  It  offers  some  improvement  over  the  non  dual 
cautious  control  based  upon  a  one-step  criterion.  The 
results  are  based  upon  a  simulation  model  with  constant 
but  unknown  parameters.  Although  the  dual  control  of¬ 
fers  some  improvement  over  the  cautious  controller  the 
improvement  is  not  significant  for  most  practical  ap¬ 
plications  where  the  system  contains  constant  parameters 
and  the  objective  is  to  control  in  steady  state  opera¬ 
tion.  However,  for  random  paramatar  variations,  dual 
control  can  soaetlmes  offer  algnlficant  Improvement 
over  non-dual  controllers  [5,9].  The  FOD  of  [1,2]  is 
attractive  due  to  its  simplicity  (it  ia  comparable  to 
the  cautious  control  design  in  algorithm  coaplexlty  and 
does  not  require  numerical  search).  The  objective  of 
the  present  study  is  to  evaluate  the  cautious  control¬ 
ler  and  the  FOD  for  large  random  parameter  variations 
modeled  as  a  random  walk.  Monte-Carlo  simulations  are 
performed  and  conditions  quantified  under  which  the 
dual  controller  offers  significant  improvement  over  a 
non-dual  cautious  controller. 

The  FOD,  although  offering  a  reduction  in  the  aver¬ 
age  cost,  is  found  to  be  unacceptable  in  many  cases. 

This  is  attributed  to  the  sensitivity  of  the  expected 
future  cost  whenever  the  system  is  characterized  by 
limited  controllability.  A  second  order  expansion  of 
the  linearization  procedure  of  [1,2]  is  presented  to 
account  for  this  sensitivity.  This  new  second  order 
dual  controller  (SOD)  inherently  Includes  a  robustness 
property  in  that  the  controller  accounts  for  sensiti¬ 
vity  of  the  expected  future  cost  due  to  parameter  estl-  . 
mates  and  their  uncertainty.  Simulatlona  are  presented 
which  show  the  Improvement  of  the  SOD  over  the  cautious 
controller  and  the  FOD.  This  SOD  uses  a  Newton  type 
search  procedure  and  is  developed  for  multi-variable 
systems.  One  of  the  main  advantages  of  the  SOD  pre¬ 
sented  herein  is  that  it  modifies  the  cautious  control¬ 
ler  with  a  numerator  "probing"  term  and  a  denominator 
correction  term.  Although  the  SOD  is  still  considered 
too  complex  for  practical  implementation,  the  structure 
of  the  control  solution  is  in  e  form  which  permits 
practical  design  changes  to  the  cautious  controller  to 
include  the  dual  properties. 

Section  2  gives  the  problem  formulation.  The  ap¬ 
proximate  dual  controller  for  the  multi-variable  input- 
output  system  is  developed  in  Section  3.  Section  4 
analyzes  this  dual  controller  for  a  scalar  example  with 
one  unknown  parameter.  Section  3  concludes  the  paper. 

2.  PROBUM  FORMULATION 

The  multivariable  system  under  investigation  is 

x(k+l)  -  c(k)  +  B(k)  u(k)  (2.1) 

where  c(k)  is  an  unknown  vector  and  B(k)  is  a  matrix  of 
unknown  parameters.  The  unknown  elements  of  c(k)  and 
B(k)  are  denoted  as  6(k)  with  covariance  matrix  F(k) . 
These  are  represented  by  a  discrete  random  model 

8(k+l)  -  A0(k)  +  v(k)  (2.2) 

E(v(k))-0  and  E(v(k)v'(J))  •  V  (2.3) 


kj 


(2.4) 


(2.5) 


The  measurement  equation  la 
y(k)  »  x(k)  +  w(k) 

where 

E(w(k))  ■  0  and  E(w(k)w'(J))  »  W  6 

E(w(k) v' (J ) )  -  0 

and  x(k) ,  y(k)  being  n  dimensional  vectors.  The  control 
criterion  to  be  minimized  Is  the  expected  value  of  the 
cost  from  step  0  to  N 

N 

J(0)  -  E{C(0)  )  -  E  {  Z  x'(k)Qx(k)+  u '  (k-1) Ru(k-l)  } 

11-1  (2.6) 

where  N  -  2  for  the  two  step  ahead  criterion. 

3.  APPROXIMATE  DUAL  CONTROLLER  FOR  TWO  STEP  CRITERION 

The  minimization  of  (2.6)  with  respect  to  u(0)  and 
u(l)  subject  to  (2.1)  -  (2.5)  Is  obtained  from  the 
Stochastic  Dynamic  Programming  equation  [12,13] 

J*(k)  -  min  E{C(k)+J*(k+l)  |Yk}  k-N-1 . 1,0  (3.1) 

u(k) 

where  J*(k)  Is  the  "cost-to-go"  from  k  to  N  and  Y*  is 
the  cumulated  Information  at  time  k  when  the  control 
u(k)  is  to  be  determined.  For  N  -  1,  (3.1)  Is 

J*(0)  -  min  E{x '  ( l)Qx(  l)+u'  (0)  Ru(0)+J*(  1)  j Y°  }  (3.2) 

u(0) 

where  J*(l)  Is  the  optimal  cost  at  the  last  step  and  is 
obtained  by  minimization  of  J(N-l)  for  N  •  2.  Assuming 
diagonal  Q  •  dlag(q^)  this  results  In  [1,2] 


J*(l) 


c'(l)Qc(l)  +  Z  q,P,(l) 
£-1  L  C 


(3.3) 


-  [c'  (l)QB(l)  +  l  q^U)]  [B'(1)QB(1)  + 


£-1 


+  Z  q»p5(l)  +  R]'1-  (B'd)Qc(l)+  Z  q.P*  (1)) 
£-1  L  £*1  '  Bc 


and 


u*(l)  -  -[B’(1)QB(1)  +  Z  q  oPp  ( 1)  +  R]_1[B'(l)Qc(l) 
£-1  c  8 


+  £  <«£pbc(1>J 


where 


P£(l) 


p*d> 

pL(i> 


PdB(1) 

pfd) 


(3.4) 


(3.5) 


P(l)  Is  the  expected  value  of  (9(1))  for  time 
step  2  given  measurement  y(l)  at  time  step  1.  The  in¬ 
dex  £  is  used  to  represent  the  row  number  In  (2.1)  and 
PM1)  is  the  associated  parameter  covariance. 

The  parameter  estimates  9(1)  and  covariances  P(l) 
are  obtained  from  the  Kalman  filter.  Since  W  Is  diag¬ 
onal  one  can  decouple  the  estimation.  Then 

(3.6) 

(3.7) 

(3.8) 

(3.9) 


ft 


1)  -  A9t(0)+AK£(1)  v^(l) 


K£(l)  -  P*(0)H'(1)  [H(1)P£(0)H,(1)+Wi)'1 
ftl)  -  ?l(0)  -  K4(1)H(1)P*(0) 


P^(l) 
whe  re 


AP  (l)A'  +  V 


vfc(l)  -  y£(l)  -  H(1)6C(0) 


H(l) 

J. 


[1  u  (0) ] 


(3.10) 

(3.11) 


ftl)  -  [c4(l)  B£(  1)  ]T  ,  £el,2,...n  row  of  8  (3.12) 


As  discussed  in  [1]  and  [2]  J  Cl)  Is  a  nonlinear 
function  of  the  parameter  estimates  6(1).  and  covariances 
P(l)  and  thus  a  linearization  was  performed.  In  [1]  a 
scalar  formulation  was  presented  and  a  first  order  lin¬ 
earization  was  performed  about  the  nominal  parameter 
estimate  squared  (9(0))2  and  nominal  covariance  P(l) . 
Also  in  [1,2]  the  vector  case  was  presented  and  linear¬ 
ization  to  first  order  performed.  To  more  accurately 
account  for  the  dual  effect  a  .second  order  Taylor  Series 
expansion  is  presented  about  9(0)  and  a  first  order  ex¬ 
pansion  about  the  nominal  covariance  P(l).  In  addition 
(as  will  be  presented  subsequently)  the  covariance  P(l) 
will  include  a  linearization  to  second  order  in  u(0) . 

In  [1,2],  P(l)  was  linearized  to  first  order.  It  is 
believed  that  linearizations  to  second  order  sre  neces¬ 
sary  to  better  account  for  the  nonlinearity  in  P(l)  and 
6(1)  of  (3.3)  and  in  u(0)  of  (3.7)  and  (3.8).  In  addi¬ 
tion  a  nonlinear  Newton  algorithm  is  used  in  the  second 
order  approximation. 

Linearization  of  (3.3)  about  the  nominal  $(1) 

“  A9(0)  and  P(l)  using  the  nominal  u(0)  results  in 

J*(l)  -  J*[l,  6(0),  P(l)  ]  +ii!£2i  [9(1)  -  A9(0)  ] 

36(1) 

+  7(6(1)  -  A6(0)  ] *  [9(1)  -  A6(0)  ] 

392(1) 


n 

+  l 


*  “  3JJCH  (p£  (1)  _  p*  (1)] 

£-1  i-1  j-1  apf  (1)  1,3  1,3 


(3.13) 


where  the  superscript  £  represents  the  covariance  matrix 
associated  with  the  £tl1  row  of  parameters  and  P^  .(1) 
is  the  1-j  th  element  of  the  covariance  matrix  ' 

PCD ,  a  being  the  number  of  unknown  parameters. 

Using  (3.6)  the  expected  value  of  (3.13)  is 

E[J*(1)|Y°]  -  J*[l,  9(0),  f(l)] 

+  7  tr(  1(1)  E{v(l)v'(l)  |y°}K'(1)  ] 

2  362(1) 

n  m  a  /  » 

+  z  Z  Z  — tpf  .(1)  -  p5  .(1)]  (3.14) 

£-1  1-1  j-1  3PT"  .(1)  1,3  1»3 

1#J 

Using  (3.7),  (3.8)  and  the  innovation  covariance 
E{vf(l)  v£(1)|Y°)  -  H(l)Pi(0)H,U)  +  (3.15) 

(3.14)  can  be  written  as 
E[J*(1)  |Y»]  -  J*[l,  6(0),  P(l)) 


./  “I  + 


3J*U). 


36^1)  36^1) 


[pf  .(l)-pf  ,U)1 

3P;fJ(l)  ^  ^  j 


3.] 


U)-AP1J(0)A’ 


(3.16) 


The  expected  future  cost  (3.16)  la  shown  to  be  a 
function  of  the  predicted  covariance  P,  ,(1)  with  a 
multiplier  given  by  the  sensitivity 


3J*(  1) 

<)u> 


and 


3  3J*(U 
39^(1)  36j  (1) 


t.j' 

Since  the  cover lence 


Pj  .(1)  depends  on  the  control  u(0)  the  control  hes  the 
1 1  j 

dual  effect.  It  should  be  noted  that  tha  importance  of 
the  dual  effect  depends  upon  the  sensitivity  of  the  ex¬ 
pected  future  cost  with  respect  to  both  the  covariance 
and  parameter  estimate. 

The  optimal  control  u(0)  can  be  coanuted  by  mini¬ 
mization  of  (3.2)  using  (3.16).  Since  P~  .(1)  Is  non- 

linear  In  u(0)  a  numerical  search  procedure  is  required. 
This  Is  accomplished  using  a  second  order  linearization 


2 


in  u(0) . 

Thus  (3.8)  is  linearized  Co  second  order  about  the 
control  u*(0),  which  is  in  the  vicinity  of  the  optiiial 
control. 


+  j[uC0)-uI(0)  ]' 


3u(0> 

(1) 


la 


(0) 


.  [uW-u^O)] 
ul(0) 

(u(0)-uI(0)] 

uX(0) 


(3.17) 


The  expected  future  cost  as  given  by  (3.16)  and 
(3.17)  is  quadratic  in  u(0)  and  thus  a  closed  fora 
solution  u*(0)  is  obtained  by  minimization  of  (3.2). 

The  optimal  dual  control  u*(0)  can  now  be  computed 
from  (3.2)  using  (3.16)  and  (3.17).  It  is  obtained  by 

S0lvin8  (3.18) 


3 

3u(0) 


E{x*  ( 1 )  Q  x(  1)  +  u’(0)Ru(0)  +  J*(1)|y°}  -  0 


The  optimal  u*(0)  is  thus 

u*(0)  -  -(B'(O)QB(O)  +  l  (q.pf(O)  +  F,)  +  R]'1 
t-1  1  8  t 

n  , 

(B’(O)Qc(O)  +  l  (q^CO)  +  f£)]  (3.19) 

where  the  matrix  F^  and  the  vector  f^  are 

P  .  r  ?  1  /  aj*(l)  13 _  3J*(1)\ 

^  t  ^  ^  2  I  Z  ”  2  *Z  -/  I 

i-1  j-1  Vap^jd)  30^(1)  30 j(l)/ 


j> _ 

3u(0) 


3u(0) 


u2(0)  ,9(0)  ,P(1) 


(3.20) 


?  ?  1  /  3J*(1 


(1)  _ 
(1) 


3 

—rx - 

38^(1) 


3 

3u(0)  3u(0) 


3J*(1) 

39^(1) 


)  \  3u(0) 


uX(0) 


)  L1 


(0) , 0(0) ,P(1) 


(3.21) 


Initially  the  nominal  value  of  u(0)  is  computed 
from  (3.19)  with  F£  and  f£  equal  to  zero.  Then  a  grad¬ 
ient  search  is  performed  until  in  the  vicinity  of  the 
optimal  u*(0) .  Then  (3.19)  -  (3.21)  are  used  until 
convergence  is  achieved.  This  iteration  procedure  is 
essentially  Newton's  method  for  minimization  of  a  non¬ 
linear  function.  The  gradient  search  is  used  because 
the  stochastic  cost  in  (3.2)  being  minimized  is  a  high 
order  nonlinear  equation  and  the  gradient  procedure  is 
used  until  ul(0)  is  in  the  vicinity  of  the  minimum 
before  switching  to  the  Newton  method.  The  nominal 
covariance  P^(l)  is  computed  from((3.7)  -  (3.11))  with 
u(0)  ■  u(0).  The  sensitivity  (partlals)  in  (3.20)  and 
(3.21)  of  the  cost  J  (1)  are  computed  from  partial 
derivatives  of  J*(l)  (3.3)  and  pZ(l)  (3.7)  -  (3.9) 
evaluated  at  the  nominal.  The  partlals  of  the  covari¬ 
ance  are  evaluated  at  u^(0)  which  is  evaluated  at  the 
previous  iteration  I. 

The  approximate  two-step  ahead  dual  control  of 
(3.19)  -  (3.21)  can  be  interpreted  as  a  modification  to 
the  cautious  controller  by  the  terms  ?(_  and  f£.  These 
terms  depend  upon  the  sensitivity  of  the  future  nominal 
cost  J*(l)  with  respect  to  the  parameters  9^(1)  9^(1) 
for  all  i,j  and  their  covariance  Pr  .(1)  for  each  row 
t  of  parameters.  Whenever  these  sertsltivltles  are 
large  the  terms  F£  and  f£  will  be  significant  (that  is 
the  dual  effect  will  be  Important).  Thus  the  sensiti¬ 
vities  take  into  account  in  the  control  solution  the 
sensitivity  of  the  nominal  future  cost  due  to  parameter 
variation  and  uncertainty.  The  larger  this  sensitivity 


the  more  Important  will  be  the  dual  effect. 

The  resulting  dual  controller  (3.19)  exhibits  a 
robustness  property  with  respect  to  parameter  variations 
and  uncertainty  of  the  future  cost  by  including  a  term 
which  appears  in  the  denominator  of  the  dual  controller. 
In  addition,  a  probing  term  appears  in  the  numerator. 


4.  SCALAR  EXAMPLE  WITH  ONE  UNKNOWN  PARAMETER 

To  further  understand  the  dual  control  solution  a 
scalar  example  with  one  unknown  parameter  b  is  presented. 
The  approximate  dual  control  solution  for  this  scalar 
case  using  Q  ■  1,aR  »  0,  is  given  by  (3.19)  -  (3.21) 
with  P.  (1)  and  0(0)  being  replaced  by  Pb(l)  and  b(0) 
respectively. 

The  partlals  required  in  the  control  law  are 


3J*(1) 

3Pb(D 


32J*(  1) 
3b(  1)  3b(l) 


c2b2(0) 


b(0),Ml)  Cb2(0HPb(l))2 

D 


(4.1) 


-2c  Pb(l) 


b(0)  ,Pb(l) 


^(D^to) 

(b2(0)+Pb(l))3 


(4.2) 


3Pb(1> 

3u(0) 


uI(0) 


2P2(0)W  uI(0)a2 

?  7 

(Pb(0)u1  (0)+vr 


(4.3) 


32Pb(l) 
3u(0)  3u(0) 


uX(0) 


-2P2(0)W 


W-3Pb(0)u  (0) 

I2  3 
(Pb(0)u  (0)+W)J 


where  the  nominal  u(0)  and  P.  (1)  are 
a  b 


u(0)  - 


b(0)  c 


b‘(0)+Pb(0) 


(4.4) 


(4.5) 


Pb(l> 


a  PB(0)W 
•  —  ..  - - 

pw(0)5z(o)+« 

D 


+  v 


(4.6) 


The  parameter  estimate  b(0)  and  P. (0)  are  computed 
using  data  up  to  k  -  0  (l.e.  y(0)). 

The  expected  future  cost  based  upon  the  lineari¬ 
zation  of  (3.16)  is  A. 

.2  2  bZ(0) 


E{J*(1)  |y°) 


■  c  -  c 


i  Mi) 


b‘(0)+?bQ) 


2  ^2 


3b  (1) 


-  a2Pb(0)-V)  +  rPbU)-Pba)) 

b 


(p.  a> 


(4.7) 


4 . 1  Evaluation  of  the  Cautious  Controller 

The  performance  of  the  cautious  controller  can  be 
evaluated  using  (3.2)  with  u(0)  evaluated  at  the 
nominal 

J(0)  -  [E{x2C1)|Y°)  +  E{j*a)|Y°)]uC0),S(0)  (4.8) 


The  first  term  in  (4.8)  represents  the  expected  cost  at 
k  «  1  and  the  second  term  in  (4.8)  represents  the  ex¬ 
pected  future  cost  at  k  »  2  using  the  cautious  control 
at  k  ■  2  (i.e.  u(D)  and  using  the  cautious  control  at 
k  ■  1  (i.e.  u(0)»u(0)).  (4.8)  is  evaluated  using 

data  Y°. 

Using  (4.1)  -  (9.7),  (4.8)  becomes. 


J(0)  ■  e2  -  JW  +c2 

b*(0)+Pb(0) 

+  i  aVa)  .  >2fb(o)S2co> 

2  3b2 (1)  Pfa(0)G2C0)-W 


2 

c 


^(0) 

^(Qi+Phd) 


(4.9) 
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The  last  tens  la  (4.7)  Is  zero  since  F^Cl)  evalu¬ 
ated  at  the  nominal  control  (i.e.  cautious  control) 
equals  P^Cl) .  The  first  two  terms  in  (4.9)  represent 
the  average  cost  at  step  k  •  1  and  the  last  three  terms 
represent  the  expected  future  cost  at  It  «  2  using  the 
cautious  control. 

A  simple  example  can  be  used  with  (4.9)  to  demon¬ 
strate  when  the  cautious  control  is  expected  to  behave 
poorly . 

Assume  a  scalar  example  with  one  unknown  b  para¬ 
meter  and  let 


b(0)  -  .05  .  P(.0)  -  .5  ,  a  -  1.0 
V  -  .1  ,  W  -  .1  ,  c  -  1 
The  expected  cost  at  k 


(4.10) 


1  and  k  ■  2  is  computed 

from  the  nominal,  u(0),  P  (1)  and  32J*(  1)  which  yields 


-.1  ,  P  (1)  -  .575  ,  -A  -(1-}  -  -3.47 

3b2  (1) 


3b2  (1) 


u(0) 

and 


J(0)  =  c  +  c2 


(4.11) 


(4.12) 


Thus  the  cautious  control  applied  at  k  ■  0  results 
in  no  reduction  in  the  cost  at  k  ■  1  due  to  large  un¬ 
certainty  P(l)  and  also  no  reduction  in  the  future  ex¬ 
pected  cost  since  u(0)  is  small  and  no  improvement  in 
parameter  accuracy  occurs  at  step  k  »  1. 

4.2  Evaluation  of  the  Dual  Controller 


The  dual  controller  of  (3.19)  -  (3.21),  (4.1)  - 
(4.6)  can  be  evaluated  by  computing  the  average  cost  of 
(4.8)  using  the  covariance 


a2Pb(0)W 

Pb(0)u2(0)+W 


(4.13) 


The  expected  future  cost  (4.7)  reduces  to 
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and  the  total  expected  cost  at  k  «  1  and  k  *  2  using 
(4.8)  is 


J*(0)  -  E{x2(1)  |y°} 


+  E{J  (1)  | Y° } 
u*(0) 


(4.15) 


u*(0) 


where 


E  {x2  (1)  |Y°) 


c  +  2b(0)u  (0)c  + 


u  (0) 


+  (b2(0>  +■  Pb(0))u*2(0) 


(4.16) 


Examination  of  (4.14)  shows  that  the  dual  control 
can  reduce  the  expected  future  cost  over  the  cautious 
control  since  the  last  two  expressions  in  (4.14)  can  be 
negative  if  u*2(0)  >  u2(0).  Thus  the  dual  property 
can  have  a  desirable  effect  on  the  future  cost. 

The  cost  J  (0)  is  computed  using  the  scalar  exam¬ 
ple  previously  discussed  for  the  cautious  controller. 

A  search  procedure  is  used  on  (4.15)  using  (4,14)  and 
(4.16)  with  the  parameter  values  from  (4,10),  and  u*(0) 
is  iterated  until  in  the  vicinity  of  the  minimum 
yield  lng 


3J*U) 

ip^ry 


-  .0075  ,  1*1  ^ 
u(0)  —  .1  3b  (1) 


-  -3.47 

S(0)  —  .1 


3Pb(l) 

3u(0) 


-  .382 


u  (0)  — .6 
.87  .  l 


3u2(0) 


.85 


t  -  +1.0  , 

u  (0)  — .6 

(4.17) 


The  above  sensitivities  (4.17)  were  evaluated  in 
the  vicinity  of  the  optimal  u^-fO)  -  -.6  and  P  (l)-.278. 
The  dual  control  u*(0)  using  u*(0)“  -.6,  c«l 


u  (0)  -  - 


b(0)c  +  .85 


o  (0)  +  Pb(0)  + 


-  -.62 


.87 


(4.18) 


The  corresponding  future  expected  cost  using  (4.14) 
and  (4.17)  is 


E{j  (1)|Y°) 


=  c2  +  ijVui 

*  3b2  (1) 

u  (0) 

2 

c  .  c  ■ 


2  *2 
PbC0)u  ^C0) 

Pb(0)u*2(0)+U 


=  .442 


(4.19) 

The  result  of  this  example  shows  that  the  dual 
control  of  (4.18)  reduces  the- expected  future  cost  to 
44X  of  the  original  c2  with  no  control.  The  cautious 
control  resulted  in  no  reduction  of  the  future  cost. 

The  terms  responsible  for  the  improvement  with  dual  con¬ 
trol  are  the  second  order  sensitivities  c^PCl)  and 


3u2(0) 


3V(1) 

3b2  (1) 

The  dual  control  of  (4.18)  differs  from  the  cau¬ 
tious  control  (4.11)  by  the  terms  F £  “  .87  in  the  denom¬ 
inator  and  tf_m  .85  in  the  numerator.  The  denominator 
term  in  effect  provides  more  "caution"  whereas  the 
numerator  term  is  an  additive  probing  effect.  The  term 
F /  provides  a  "robustness"  property  in  that  the  sensi¬ 
tivity  of  the  future  cost  to  parameter  uncertainties  as 
they  appear  in  the  controller  (i.e.  b^O))  are  minimized. 
Thus  a  new  interpretation  of  the  dual  control  is  that  it 
contains  robustness  and  learning  (via  probing).  These 
concepts  are  applicable  to  the  multivariable  dual  con¬ 
troller  in  (3.19)  -  (3.21). 


5.  SIMULATION  RESULTS 

Performance  was  evaluated  from  100  Mgnte  Carlo 
runs  for  the  following  controllers  where  b(0)  was  set 
to  b(0)  with  covariance  Pb(0) :  1)  Cautious  Controller 

2)  FOD  3)  SOD 

The  above  algorithms  were  tested  for  two  cases: 

a)  Time  varying  case,  b(0)  ■  .05,  Pb(0)  •  1.0  , 

V  -  .1,  c  -  1.0,  w  -  .01  and  V  -  .1,  a  -  0.9 

b)  Constant  case,  with  b(0)  ■  .05,  PbC0)  *  1.0, 

V  -  0,  c  -  1.0,  W  -  .01  and  W  -  .1,  a  -  1.0 
Example  a 

Table  1  summarizes  the  results  of  the  simulation 
runs.  All  three  algorithms  were  tested  on  this  example 
for  two  different  levels  of  measurement  noise  covariance, 
W  ■  .01  and  W  ■  .1.  100  Monte  Carlo  runs  were  performed 

each  of  40  time  steps.  For  each  run,  an  average  cost 
was  computed  over  40  time  steps  and  then  the  averages 
over  100  runs  are  tabulated  in  Table  1  and  Table  2. 

The  tables  clearly  indicate  that  the  SOD  yields  the 
least  cost.  The  dual  effect  shows  a  larger  improvement 
for  larger  measurement  noise  (i.e.  W  •  .1).  Run  numbers 
7  and  14  of  the  100  Monte  Carlo  runs  were  selected  for 
plotting.  The  cost  and  parameter  value  are  plotted  in 
Figures  1  through  4.  It  is  evident  that  the  second 
order  dual  improves  upon  the  other  two  on  the  average. 
Example  b 

In  this  case  the  true  parameter  was  close  to  zero 
(i.e.,  b(0)  ■  .05)  but  constant.  Tsble  2  susaarlzes 
the  result.  The  average  cost  obtained  by  tha  SOD  is 


4 


much  lower  than  the  other  two.  The  SOD  always  exhibited 
excellent  convergence  whereas  the  other  controllers  per¬ 
formed  poorly.  In  addition  the  new  controller  consist¬ 
ently  avoided  turn  off  and  burst  (5).  This  was  an  im¬ 
portant  common  feature  in  all  the  Monte  Carlo  runs. 

Runs  26  and  80  are  plotted  in  Figures  5  and  6  respect¬ 
ively,  as  typical  examples. 

The  simulation  study  has  shown  that  the  new  dual 
controller  Improves  upon  the  cost  on  the  average.  The 
magnitude  of  the  improvement  on  the  average  appears  to 
be  relatively  small  for  the  noise  levels  used.  However, 
the  real  advantage  of  the  new  dual  controller  is  the 
improvement  in  those  instances  where  Che  cautious  con¬ 
troller  and  the  FOD  [1,2]  yields  unacceptable  results. 
Although  the  FOD  [1,2]  shows  Improvement  over  Che  caut¬ 
ious  controller,  it  has  been  found  to  be  unacceptable 
at  many  time  points. 

6.  CONCLUSION 

A  new  adaptive  dual  control  solution  based  upon  the 
sensitivity  functions  of  the  expected  future  cost  has 
been  presented.  This  controller  (SOD)  takes  into  ac¬ 
count  the  dual  effect  better  by  performing  the  second 
order  Taylor  series  expansion  of  the  expected  future 
cost.  The  form  of  this  controller  is  a  modification  of 
the  one  step  cautious  controller.  The  FOD  of  [1,2]  did 
not  have  the  denominator  correction  term  like  the  pre¬ 
sent  one.  This  adds  stability  to  the  new  control  de¬ 
sign.  Simulation  results  of  a  scalar  model  have  shown 
the  Improvement  obtained  using  the  new  dual  algorithm. 
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Fig.  1.  Time  history  of  cost  comparing 
the  SOD,  FOD,  and  the  cautious 
controller  (Time  varying  parameter 
case:  Run  No.  7  from  100 
Monte  Carlo  thins) . 


Fig.  2.  Time  history  of  parameter 
for  Run  No.  7  from  the  100 
Monte  Carlo  thins 
(Time  Varying  Case). 
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W-.l,  V-.l,  B-.05,  P-1.0 


W-.l,  V-0,  B-.05,  P-1.0 


Run  Number  14 


Run  Number  26 


CAUTIOUS 

POD 

SOD 


Fig.  3.  Time  history  of  cost  comparing 
the  SOD,  POD,  and  the  cautious 
controller  (Time  varying  parameter 
case:  Run  No.  14  from  100 
Monte  Carlo  Runs) 


Fig.  5.  Time  history  of  cost  comparing 
the  SOD,  FOD,  and  the  cautious 
controller  (Constant  parameter 
case:  Run  No.  26  from  100 
Monte  Carlo  Mins) . 


W-.l,  V-.l,  B-.05,  P-1.0 


Fig.  4.  Time  history  of  parameter 
for  Run  No.  14  from  100 
Monte  Carlo  Runs 
(Time  Varying  Case). 


W-.l,  V-.0,  B-.05,  P-1.0 


Run  Number  80 


CAUTIOUS 

FOD 

SOD 


Fig.  6.  Time  history  of  cost  comparing 
the  SOD,  POD,  and  the  cautious 
controller  (Constant  paraamter 
case:  Min  No.  80  from  100 
Monte  Carlo  Mins) . 


Measurement 
Noise  Covariance 
W 

Average  Cost 

Cautious 

First  Order 
Dual 

Second  Order 
Dual 

.01 

.475 

.469 

.458 

.  1 

.623 

.608 

.514 

Table  1.  Average  Cost  for  the  three  controllers  on  the 
time  varying  parameter  model  (b(0)-.0S, 
Pb(0)-1,  V-.l,  c—  1) 


Measurement 
Noise  Covariance 
W 

Average  Cost 

Cautious 

First  Order 
Dual 

Second  Order 
Dual 

.01 

.109 

.087 

.069 

.1 

_359_j 

.250 

.142 

Table  2.  Average  Cost  for  three  controllers  on  the 

Constant  Parameter  Model  (b(0)-.0S,  P.  (0)*1, 
V-0,  c-1) 
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